Plan your downtime
An enterprise can use a number of existing technologies
and processes to control its planned downtime. by Radha Shelat
companies have implemented clustering, volume management, and replication technologies
as a line of defence against unplanned downtime. These may be caused by server
failures, site outages, and other events and can affect customer service levels.
However technology can also be leveraged to reduce costs and outages associated
with planned downtime events and help provide a significant Return on Investment
Escalating planned downtime
Planned downtime refers to any planned administrative operation that can potentially
interrupt or slow down service in areas such as a server upgrade, data movement,
server consolidation, or site maintenance.
Planned downtime occurs more often than unplanned partly because hardware systems
have grown more robust and resilient over time, and the average duration between
failures is constantly improving.
The frequency of scheduled downtime processes in most enterprises is rising.
A data centre in a large enterprise may run multiple classes of applications,
on hundreds or even thousands of systems. Administrators may schedule modifications
almost daily in order to reconfigure systems, and perform upgrades or apply
patches. Depending upon the processes that are in place, any modification may
result in downtime.
Performing routine operations, namely upgrades and patches, without disrupting
service is a major concern for companies today. The immediacy of online services
has produced a culture of end users who do not tolerate delay.
In online businesses, for example, an outage or slowdown that lasts five minutes
is almost guaranteed to drive impatient customers to competitors' sites for
goods and services. If customers experience the delay again, they may never
come back. From the revenue and site branding standpoints, these are serious
Leveraging Disaster Recovery solutions
Data protection and Disaster Recovery solutions can be leveraged for planned
downtime operations, dramatically shrinking both administrative costs and outage
windows. These tools automate procedures to make administrators more efficient,
reduce the possibility of human error, and accelerate processes that may have
an impact on end users.
The term 'planned downtime' is in fact an oxymoron. The downtime associated
with most planned operations that could potentially affect uptime can be eliminated.
For instance, automated real-time replication of volumes can move data transparently
in a vendor-independent manner to other servers at any location worldwide. Automated
stateful failover can move a live application to another server without interrupting
services. A clustering solution can migrate an entire data centre to another
Let''s look at some typical examples.
Scenario one: Migrating an application
You want to move all your users to another server. There is no server available
within the data centre, so you have to move the application, with all its user
groups and data, to a server in a data centre that is two time zones away.
Volume management and replication software can mirror the data to a storage
centre at a distant location. Clustering software can move the application to
a new server in a failover operation that preserves the state of the application
and its user data. These planned, automated operations can carry out the migration
with virtually no impact on service levels and allow users to continue accessing
their data and application without incurring the downtime normally associated
with application migration.
two: Upgrading Microsoft Exchange
You are running Exchange in a non-clustered, non-automated environment, and
you need to upgrade it with a new service pack. You are forced to shut down
the server and cut-off user access to what is arguably their most critical application.
You then have to load the new service pack and the new application, point the
application to the data, and redirect your clients to the new server. If on
the other hand, you're running Exchange in a clustered environment, performing
upgrades has a minimal impact on users.
With automated clustering and volume management tools, copying the data to a
disk and migrating the Exchange application service to another server within
the cluster is a push-button operation that can save hours of downtime.
Scenario three: Server consolidation
You have a number of stand-alone systems in your environment that are not generating
the RoI that they could. A decision is made to bring in a new class of server
and storage hardware that can be configured with multiple domains.
Using clustering technology that you may have acquired as part of your solution,
you simply add each stand-alone system to the new cluster and execute a migration
command to move the application over. Your consolidation ratio is 12:1 (in this
hypothetical case), and you wind up with a much smaller footprint for those
applications translating to higher reliability and availability. You can perform
the same amount of work with fewer servers and reduce your hardware and maintenance
Scenario four: Performance-based migration
You have applications running in an environment where performance levels are
not satisfactory. The time required to process a transaction and to respond
to a mouse click on a Web page has reached a point where your monitoring software
is sending alerts.
Your disaster recovery solution includes a cluster server and a cluster file
system that helps you migrate the applications, permanently or temporarily,
to systems that are better able to handle the service. The process is automated
and there is no need to do a restart on the new node. You eliminate downtime
while maintaining application performance.
The bottom line
The bottom line is speed, efficiency, and cost reduction. Clustering, replication
and volume management tools that comprise an effective high availability or
disaster recovery solution can contribute handsomely to your RoI when they act
as facilitators of planned downtime projects. One obvious benefit is that they
are already in place at most companies making additional expenditure unnecessary.
Another powerful benefit is that these tools automate procedures, reduce administrative
costs, increase efficiency and eliminate the possibility of error. Many highly
automated routine procedures can be initiated remotely by administrators or
run at predetermined intervals.
The most significant benefit, however, is the ability of these tools to reduce
downtime to a matter of seconds. Using stateful migration, users can be migrated
to another set of systems without the need for them to reconnect, thereby significantly
reducing the downtime associated with planned maintenance. The connection is
persistent, and the state of the application is maintained, even for users who
were conducting transactions.
Unforeseen events and site outages will happen. However,
clustering, replication and volume management technologies can be leveraged
to ensure the availability of data and applications, minimise the impact of
failures on a business and ultimately align IT with business operations while
significantly increasing an enterprise's RoI.
Radha Shelat is the Chief Technology Officer of Veritas Software India.