Troubleshooting
a network
A
paralyzed network can cost an organization oodles
of time and money. Here are some handy tips to help
you trace and rectify any network-related problems
By Mahesh Rathod
One
of the most important
precautions
to minimize network collapse is to insist on reliable
network designs based on the official configuration
guidelines. However, since a network is a conglomeration
of numerous components and devices, something is bound
to go wrong somewhere, somehow even in the best of
networks.
There are numerous reasons why your network can go
down, but there are a few basic approaches to
troubleshooting that can help you locate the problem
faster.
One of the best ways to avoid unnecessary network
downtime and improve reliability is to make sure that
your network cabling and signaling
system reach the predetermined standards and are built
accurately
using quality components. A number of surveys have
revealed that about 80 percent of network failures
are related to the network medium: the cables, connectors,
and hardware components that make up the signal-carrying
portion of a network.
Many problems with media systems are due to improperly
installed hardware, the use of incorrect components,
network designs that violate official guidelines,
or a combination of all three.
Following a Troubleshooting Model
Troubleshooting
a network problem is a combination of the scientific
method and the technique of 'divide and conquer'.
The scientific method is where you formulate a hypothesis
and then test it. Using your knowledge of the symptoms
and your knowledge of how networks operate, you form
one or more hypotheses to explain the behavior of
the network, and then you perform tests to see if
those hypotheses
hold up.
The following steps will guide you in troubleshooting
your network.
Discover the problem. This is the fault detection
stage, in which you are notified of a problem. Notification
may be done by automatic fault detection software,
or users may call you with a problem report.
Gather facts. This is a process of acquiring
information about the problem.
This is rather like the game of "Twenty Questions"
in which you ask leading questions to try and discover
information.
Create Hypotheses. Based on the facts you have
gathered and your knowledge of how networks function,
you should be able to create one or more hypotheses
about the source of the problem. While doing so make
sure you do not overlook the obvious. Test the obvious
hypothesis first, before spending time on more complex
theories. Try to avoid jumping to conclusions, and
do not make unnecessary assumptions about the cause
of the problem. Make sure that the hypothesis you
create can adequately account for the symptoms and
other information you have collected.
Develop an action plan. At this stage you may
have enough information
to begin tests of a given device on the network. A
test might be as simple as replacing the device with
a spare, and then checking to see if the problem was
resolved. Or, at this stage you may need to further
isolate the problem, in which case your action plan
may include some variety of 'divide and conquer',
or binary search, which is described later.
Implement your action plan. When troubleshooting
a problem, try to make only one change at a time.
The goal is to eliminate suspected problems one at
a time, to limit the number of things you are trying
to test and evaluate at any given moment. This way,
you can avoid losing
track of the problem by trying to evaluate too many
things at once.
Test and observe results. After making a change
in the system, you need to test and observe the results,
to make sure that you have resolved the problem. If
your action plan was based on a binary search, then
the test should show you whether the problem is still
active, or whether it is now located in the portion
of the network that you have isolated as part of the
binary search.
Repeat the troubleshooting process. If the
symptoms still persist, you need to repeat the process
until you have resolved the problem.
These tips will help you troubleshoot your network
the next time it gives a problem. Check this column
for more interesting topics in the coming issues.
Troubleshooting
a network using Qcheck
Qcheck
is a free utility that provides network performance
measurements and helps in troubleshooting. It can
be downloaded at www.qcheck.net and consists of two
software components: a console with a graphical user
interface, and distributed software agents called
Performance Endpoints (or simply, "endpoints").
The Qcheck console is where you configure and run
tests and then view test results. You can select among
several kinds of tests, together with the addresses
of the endpoints and the protocol to use between them
(TCP, UDP, SPX, or IPX). The test results are summarized
in a window, while a more detailed report of test
results can be viewed in a Web browser.
What performance indicators does Qcheck measure?
Qcheck
gives a quick check of the response time, throughput,
streaming capability, and routes between a pair of
computers in a network. Depending on the application
and user, different measurements reveal a lot about
overall network performance.
Key measurements for most business
and Web transactions are response time and throughput.
When an answer or reply is needed quickly or frequently,
users worry about response time. When large files
must be transferred quickly, throughput is the primary
concern. Users of streaming
applications need to know that a network can support
a fixed level of throughput. They also have an additional
concern: how many data packets are lost as the data
is sent from the sender to the receiver. The lost
data information Qcheck provides can help determine
a network's readiness for streaming multimedia traffic.
Finally, traceroute information can show performance
bottlenecks and slowdowns at intermediate nodes. When
latency is important, either as a component of response
time or for real-time streaming applications such
as VoIP, traceroute
can show the paths in each direction and the round-trip
time to each hop on the paths.