do you build servers to suit your applications? by Graeme
K Le Roux
years ago, someone from Novell remarked that using the
then new OS/2 was like driving a semi-trailer to the
local 7-11 for a slurpy. His point was that OS/2 was
an overkill for the bulk of server situations at the
time i.e. file storage and print serving.
He was right but his comment overlooked the fact that
OS/2 was not intended simply to be a platform for storing
files or feeding printers, it was meant to run applications
like database engines and mainframe gateways.
Today, with the advent of low cost NAS, using a server
for file storage is a rather expensive option, unless
you have many printers which require a lot of host processing.
Purpose-built print servers are arguably a more cost-effective
option than a generic print server. Today, servers run
applications such as e-mail or more general messaging
services, database engines, Web services and act as
hosts for thin client applications. Each of these applications
place different requirements on a host platform which
is what a server is and thus the configuration of a
server can be critical to its performance. Configuration
can also affect reliability.
Broadly speaking, applications can be divided into two
basic categories; I/O-bound and compute-bound. As their
names suggest, the former is limited by the I/O capabilities
of the platform while the latter is limited by processing
capability. Under most circumstances, it is an extremely
bad idea to mix the two types of applications because,
in general, you can cost effectively build a system
to handle a very high rate of I/O, or you can build
it to have a very high processing throughput, but not
Consider the difference between a basic Web server and
a supercomputer. The Web server is designed to take
a large number of small requests and respond with larger
parcels of data in the form of Web pages. In this case,
little processing is involved but disk and network I/O
rates are critical.
A supercomputer on the other hand, works by loading
an entire job into memory, throwing massive processing
power at it, and then downloading the results from memory
to clients usually via a dedicated front-end processor.
The supercomputer may chug away for hours on a single
job that's what its for.
The difference between a host built for I/O-bound applications
and one built for compute-bound ones often boils down
to a few key hardware issues which are often neglected.
All computers are built around one or more system buses
with a CPU at one end and a terminator at the other.
Devices such as disk controllers, network interfaces,
etc., are attached to a system bus somewhere between
the CPU and the terminator.
System buses have a finite bandwidth overload the bus
and connected devices will spend a significant proportion
of their time waiting for access to it. For example,
a high-end RAID controller or a Fiber Channel card on
the same system bus as a 100 Mbps network adapter or
worse a gigabit NIC can overload the PCI bus and thus
prevent either device from running at full speed. This
would therefore be a poor configuration for, a very
heavily-loaded Web server.
You could use a platform with a separate system bus
for the NIC and the storage controller. But whether
this would be of any benefit would depend on the way
in which the application uses the processor/memory sub-system
which connects the two buses. In most cases, having
more than one bus helps, but at some point you would
need to consider a combination of SAN and load-balancing
Another thing which can drag down the performance of
a system bus is the nature of the devices attached to
it. So called "bus-mastering" devices are
generally far more efficient in their use of a bus and
usually faster i.e. have a higher throughput than non-bus-mastering
devices. They also place a lower load on the system
CPU which allows it to work more efficiently. As a general
rule, choose bus-mastering devices in preference to
non-bus-mastering devices in servers.
Another trap for the unwary is in mixing different types
of devices on peripheral buses: USB, Firewire and most
notably, SCSI. The basic rule is that a peripheral bus
will run at the speed of the slowest device. Hang a
legacy SCSI tape drive on a Wide SCSI bus and the bus
runs at the speed of the legacy device. Thus, a relatively
slow device like a tape unit on the same bus as fast
devices like hard disks will cause throughput of the
bus to be governed by the speed of the tape unit.
As a general rule, try to group devices by type-hard
disks, tapes and removable storage, and scanners and
put each group on a separate bus (channel in SCSI terms)
if not a separate controller. It is also a good idea
to avoid placing a mix of internal and external devices
on the same SCSI bus. This works perfectly well, but
you generally get better performance if you avoid such
a mixture. Multi-channel SCSI adapters are relatively
inexpensive and under modern OSs, installing multiple
SCSI adapters in a single host is quite simple.
Don't share disks
One of the most common mistakes which is made in configuring
servers is sharing a hard disk between executables (the
OS and application programs) and data (e.g. databases,
Web pages, etc.). Hard disks are mechanical devices
consisting of one or more platters and a head/positioner
assembly which flies over the platter(s).
No matter how many tricks you do in the software and
hardware to buffer and/or queue read and write requests,
the inescapable physical fact is that a disk head cannot
be in two places at once. Furthermore, the slowest operation
a disk is capable of aside from spinning up or down
is moving its heads. If you put data and executables
on a single disk, then you guarantee that the disk will
have to move its heads a lot more than if you provide
Further, in a multi-threaded OS environment, you have
a situation where two or more processes are attempting
to read and/or write to different locations at the same
time. If one process is the OS, it will preempt the
others; and if a process is dealing with real time I/O
(say streaming data out a network port), data may be
lost or unacceptably delayed.
As a general rule, build a server with at least two
physically separate hard disks and install your executables
on one and data on the other. In practice, this simple
and inexpensive change to server configuration can as
much as double the performance of database engines,
messaging hosts (e.g. Microsoft Exchange) and Web servers.
Empirical evidence suggests that it also makes for a
much more reliable server particularly in Windows environment.
This is possibly due to the slower rate of fragmentation
on system disks.
Ramming up performance
The next most important thing to remember about configuring
servers is that nothing improves the performance of
virtual memory like real memory. In most cases, your
system disk the one with all the executables is also
where the OS is going to place the data associated with
its virtual memory system. This is data, and if the
OS swaps memory "pages" to disk too frequently,
you end up with the same problem we discussed above
with regards to simple application data.
Note that the problem here is not the amount of pages
which are swapped to disk; it's the frequency with which
the system has to access the hard disk rather than actually
doing useful processing.
So reduce the number of times an OS has to go to the
hard disk for memory pages by adding more physical memory
to store pages in. While OS vendors will tell you how
much memory is required to runtheir OSs, it is not the
amount required for the OS to successfully run an application.
To run an application, you have to add the amount of
memory required by the application to the amount required
by the OS. You also typically have to add a small amount
of memory for each concurrent user. For example, an
NT server typically requires about 256 MB RAM, Microsoft's
IIS server needs about 128 MB RAM, so the minimum for
a stable, reasonably heavily, used NT IIS server is
about 384 MB RAM. Obviously, the amount of memory you
put in a server varies with the application application
servers in a large thin client environment can easily
use a gigabyte of RAM or more.
Processing for power
issue is processing power or throughput, which is, the
number of instructions that can be processed per second.
There are two ways to increase processing throughput:
install a faster CPU or add multiple CPUs. Which option
works better depends upon the nature of the code being
processed. This is not a new problem; the pioneers of
supercomputing had to grapple with it and it was the
core of the RISC/CISC debate.
Let's say you have 21 pairs of numbers to add up. Now
suppose you have a CPU which can do the job in 21 seconds
and another which can do the job in 42 seconds. You
can either use one CPU and get the job done in 21 seconds
or you can use two of the slower CPUs and get the job
done in the same time. You'd simply pick the cheapest
option. But what if every third pair of numbers was
dependant upon the result of the preceding two calculations?
The fast CPU still does the job in 21 seconds, so will
the slower CPUs-provided we present the sets of numbers
in the same order. But what if we don't? Say we present
the additions which are dependant upon other calculations
first. The fast CPUs will have to stop and presumably
execute extra code to find and do the other additions
first. The pair of CPUs will fair better; CPU 1 will
wait until CPU 2 does the calculations required; it
will then do its additions while CPU 2 waits. CPU 1
will then do the next two additions and so on.
Whether or not two CPUs are better than one here depends
upon the processing overhead incurred by the faster
CPU when it hits an addition, which requires it to process
other additions first. It is this problem of dependence
which currently very often prohibits GHz-class CPUs
from achieving their benchmark performance in practice.
As a rule of thumb, servers generally provide better
throughput in practice with multiple CPUs than with
single faster units. You can also aggregate CPUs to
achieve superior performance. For example if your server
offers a maximum CPU speed of 1.2 GHz you can install
two 900-MHz units and achieve in theory a processing
throughput of 1.8 GHz.
Don't use a server for simple file storage.
Don't mix compute- and I/O-bound applications on a
Choose Bus-Mastering adapters where possible and avoid
overloading system buses.
Use multi-channel SCSI controllers and/or multiple
controllers to avoid mixing device types-hard disks,
tapes and removable storage, and scanners. Where possible,
do not place external and internal devices on the
Provide separate hard disks for data and executables.
Ensure that you have sufficient physical memory on
the side of overkill.
Multiple slower CPUs are often a more robust solution
than single fast ones. Ideally, choose multiple CPUs,
each of which runs at about 75 percent of the maximum
processor speed the system supports.
Graeme K. Le Roux is the director of Morsedawn (Australia),
a company which specialises in network design and consultancy
and writes for Network Computing-Asian Edition.