Improving Network Availability
Here's a quick guide to defining, measuring, and improving
availability in your network. by Sudhir Narang
Analyzing and predicting availability for your entire network, not individual
devices, is complex, but it's certainly workableand essential. A highly
available network has become a business necessity. But today's networks carry
multiple applications and incorporate a wide variety of devices and paths, so
they may appear difficult to assess. Today, vendors deliver various products
& solutions that enable network managers to get very detailed and precise
information on the problem areas, and the availability of networks. This article
formulates a practical definition for High Availability along with metrics and
methods for measuring network availability. It also examines four areas to assess
and plan remedies: network design, infrastructure, operations and maintenance,
Define First, Then Measure
Network availability is more important today than ever before. The heat is on
because users expect much more from networks compared to the early days of e-mail
and file sharing. Today, enterprises maintain on their networks a multitude
of mission-critical applications that drive their businesses: videoconferencing
with partners, Web-based order entry, payment processing, and customer relationship
management, to name a few. The network simply cannot go down.
These types of applications are a cornerstone to making operations much more
efficientand are absolutely critical to an enterprise's viability and
productivity. Meanwhile, they also make networks more complex, and availability
becomes an issue. The crucial concept to remember here is that availability
must be defined for the network as a whole. The probability that users can get
to applications is the sum of the availability of individual devices and paths.
Fortunately, the data for analyzing the availability of multipurpose data networks
are all there for the asking: the Simple Network Management Protocol (SNMP),
ping, trap, and trouble ticket reports that most enterprises already collect,
as well as from published statistics by network device vendors. Additionally,
the software exists to put this data together in useful ways, so network managers
can see the level of availability their users are experiencingand also
analyze and project problems to be fixed before they affect users. Having five
nines99.999 percent availability, or just a few minutes of downtime a
yearhas been the traditional aim for most large enterprise-wide and service-provider
In a multi-application network, availability isn't an issue of how many 'nines'
you have. It's having a highly available network where you need itfor
your mission-critical applications. Clearly, the network manager's perspective
should encompass the long-term, so as to spot and fix potential problems before
they affect users.
Calculating Theoretical Network Availability
Most consultants on high-availability usually begin an analysis of a network's
availability by calculating the Theoretical Availability of three categories
of componentshardware, software, and power suppliesin each of the
three network segments: access, distribution, and core. The first step that
network consultants take is to use information from product data sheets of the
various components of the network. This approach is well regarded since it defines
what is theoretically possible for a company to attain, and helps set expectations.
In our experience, the math always yields the lowest common denominator. It's
common sense, that a chain is only as strong as its weakest link. The "six
nines" of reliability in the core and distribution portions are due to
the redundancy that permeates both areas in this network. For many organizations,
a lower rating is acceptable in the access layer because it affects fewer users.
When a company collects enough of its own data, it can replace the theoreticals
This Theoretical Availability assessment is important, as it can point the way
to areas that need work. In addition, planners can use it to determine if the
network, as functioning or with planned upgrades, can meet the needs of the
business. And the path availability estimates can help predict the cost of downtime
and the return on investment likely from improvements to the network. Importantly,
any network availability study needs to study network behavior in the long term.
Managers need to typically study the trend for at least three months, preferably
Getting to the Availability You Need
The real challenge comes after an enterprise has baselined availability and
identified problem areas. Network managers now need to start measuring how the
network measures up in reality. To this effect, managers need to assess four
areas to pinpoint and plan remedies: network design, infrastructure, operations
& maintenance, and support.
The most important aspect of network design is a rigid hierarchy of core, distribution,
and access. Corporations usually want the highest availability and resilience
in the corethe five nines. In the Distribution layer, managers may be
able to go down a nine, and perhaps only need three nines in the access layer.
This approach also tracks the economics, as you have the fewest devices in the
core, with most at the edge, perhaps even orders of magnitude more.
Device-level resilience can be provided by redundant processors and uninterruptible
power, as well as with technologies rooted in products that comprise the network.
The availability delivered by each of the network components really determines
the total network availability of an organization. The organization needs to
invest in products that can deliver optimal application-level resilience.
Then there are overall design principles such as avoiding single points of failure
wherever possiblefor example, having only one logical or physical circuit
out of a router. When any link is down, the router is out of business. Enterprises
need to invest in creating redundant paths to every device. If you have a single
point of failure in your core, it's only a matter of time before it comes back
to haunt you.
A network design is only as good as the systems it comprises. In the physical
infrastructure, both hardware and software play key roles in achieving availability.
Core systems should have full redundancy built in. Core systems should also
have sufficient throughput and be scalable so they can accommodate spikes on
demand, particularly as next-generation IP-based services such as videophone
conferencing and public wireless access are added to the network. And, of course,
every vital system needs UPS.
One way to ensure high reliability in the network is what is called the 'cookie-cutter
approach.' That is, every branch office, data center, and network node that
performs the same functions should have identical hardware and software. This
way, you can make fixes faster and more accurately; cut down on your spares
inventory, and compare apples to apples when monitoring your network.
Operations and Maintenance
Once you know how your network is arranged and base-lined, you can tweak operations
for maximum efficiency. In many cases, you might not need new hardware, just
operational changes. Efficient operations depend on rigorous adherence to best
practices as well as network maintenance, which, again, requires monitoring
to catch problems before they affect users. Appropriate performance analysis
and metrics tools are important. It is also wise to map your maintenance contracts
to your business and availability goals.
The final component in improving availability is to evaluate your support structures
to make sure they match your needs. Know the capabilities of your own personnel
and of your vendors, and how they fit the needs of your network. Do your people
have the skill sets they need? Does your vendor have the tools and the required
support service? Can they provide the training that your staff needs?
Many enterprise managers and executives are beginning to understand that network
availability management is fundamental to successful network operationsjust
as the accounting department is integral to controlling costs.
The relationship between availability baselines and corporate bottom lines is
not surprising. Networks are rapidly increasing in complexity and criticality
to an organization's success. And as voice and video services are folded into
existing data networks, this relationship will become even more profound. All
this points to the need for knowing how your network is performing and supporting
organizational goals. Armed with metrics such as network availability, decision-making
is improved and allocation of resources is more precise.
The author is Vice President, Cisco Systems India & SAARC