|
Disaster Recovery
Damage control, the multi-site way
3DR matters apart, India Inc uses multi-site DR in myriad
ways to ensure business continuity. Dominic K explores in detail what
makes this concept work
Continuous
Data Protection, Information Lifecycle Management, and other assorted technologies
and methodologies which are meant for data security will not suffice unless
backed by a redundant IP-based multi-tier multi-site disaster recovery architecture.
That is why multi-tier DR architectures similar to the one at Indian Oil Corporation
Limited (IOCL) are popular among enterprises.
In the case of IOCL, site one is referred to as the primary site followed by
a secondary site called the business continuity centre and finally the third
tier which is usually based in a different city or state called the disaster
recovery centre.
Multi-Site, Multi-Tier Soars High
During the last year (2005-06) as well as the current year,
most Indian organisations across verticals have made it a point to deploy DR
while consolidating their data centres. This has been achieved using multi-site
DR, a trend which is noticeable in the BFSI and manufacturing sectors.
Today DR deployment is multi-site and multi-city with Mumbai-Bangalore, Mumbai-Chennai
and Delhi-Chennai being some popular combinations. Shyam Gopal, Country Head,
Brocade elaborates, It can also be within the same region like DelhiNoida
or MumbaiNavi Mumbai. The trend currently among enterprises across verticals
and sectors is to opt for multi-site and multi-tier DR. No organisation can
afford to lose out on data, especially in the BFSI space.
Multi-site DR is an evolving process and it will take some time before similar
set-ups are deployed across organisations. The factors influencing these are
technologies such as Fibre Channel over Internet Protocol (FCIP) and availability
of lower cost dedicated WAN links between cities.
Telecom service providers have connected the big cities with
high speed fibre for last mile connectivity. This is why multi-site deployment
addresses the critical needs for data and link recovery from regional disasters.
The Rationale
|
To have an effective DR setup,
the sites or locations should be separated by great distances. There is
no point in copying data between sites if they are close to each other
since both sites may be wiped out by the same disaster. The other driving
force is stringent government regulations.More organisations today are
coming under the purview of some regulation or the other
|
To have an effective DR setup, the sites or locations should
be separated by great distances. There is no point in copying data between sites
if they are close to each other since both sites may be wiped out by the same
disaster. The other driving force is stringent government regulations. More
organisations today are coming under the purview of some regulation or the other.
This makes it necessary for them to retain data records for a longer duration
but puts the onus on them to make the data records available as and when required.

Sumit
Mukhija
|
As Sumit Mukhija, Business Development Manager, Cisco Systems
India & SAARC states, The BFSI vertical and service providers are
most affected by regulations. For these organisations, data is their
bread and butter. Therefore, the first ones to adopt multi-city DR (primary,
business continuity and secondary sites) have been banks and service providers.
Gopal of Brocade has a similar view, Indeed, regulatory requirements are
mandating this category of solutions for many industries. All solutions in this
category must be able to move large amounts of data in a reliable and repeatable
manner. As a result, they are generally built on top of a Fibre Channel SAN
infrastructure.
This is mostly achieved by extending a SAN over hundreds
of kilometres, or across a continent. To be effective over such distances, it
is sometimes necessary to use an IP network to transport the FC SAN data. The
standard for doing this is a protocol called FCIP, which allows transparent
tunnelling of FC switch-to-switch links across IP networks.
Multi-Site DR Routes
The current multi-site DR approach is to have a primary site, a near-line site
(also called a business continuity site) and a secondary or DR site. The primary
and the near-line sites are usually in the same city or even in the same campus.
The secondary site or the DR site is in a different region. In extreme cases
it could even be in a different continent.
Copies are replicated at the BC site from the primary data centre at regular
intervals. This cycle time represents the Recovery Point Objective (RPO) that
could be achieved.
RPO represents the point in time to which data must be restored in order to
resume processing transactions. Recovery Time Objective (RTO) is the period
of time after an outage in which the systems and data must be restored to a
predetermined RPO.
| Internet Fibre Channel Protocol (iFCP) is a TCP/IP-based
protocol for interconnecting FC storage devices or FC SANs using an IP infrastructure
in conjunction with, or in place of FC switching and routing elements.
iFCP was initially targeted to replace the core of a
FC SAN with IP, but it didnt catch up as expected due to the introduction
of iSCSI as an end-to-end IP SAN technology. Although iFCP has been in
existence around for quite some time now, it has not witnessed significant
adoption.
iFCP-based products need a large amount of hardware
level buffering. This is because new TCP connections are created between
each device pair that communicates across the service. The TCP processing
overhead grows rapidly as more devices use the iFCP gateway service. Also,
management and troubleshooting gets complex with multiple connections.
In comparison, FCIP is a widely accepted SAN extension
technology. FC was designed for local SANs, but FCIP extends the distance
to remote locations via any IP network.
VSANs, IVR, and FCIP all use the same set of FC routing
as well as alert and response mechanisms. This is why FCIP with its wider
acceptance across vendors and enterprises, large installed base and its
availability to use FC fabric routing and load balancing with support
for FC Security protocols is here to stay.
|
Criticality Determines Mode of Replication

Sudhakar
Rao
|
Is remote replication for recovery and business continuity
all about shipping data over an IP network? If the organisation cannot tolerate
any data loss and its operations must resume instantly following an outage or
unforeseen event, then synchronous replication will be an ideal choice. The
decision must also factor in how far data has to be replicated and the extent
to which application performance degradation can be tolerated.
On the flip side, if the organisation can tolerate some degree of downtime while
the last few transactions are reconstructed or cannot tolerate the performance
impact of synchronous propagation delays then asynchronous replication may be
a cost-effective option.
Remote tape backups have not been very popular in the past. However, availability
of fast and inexpensive WAN links and features such as FCIP tape acceleration
have significantly improved throughput over WAN links for remote tape backup.
It is making remote tape backup feasible. Economical long distance connectivity
has come as a bonus for enterprises.
| There are no shortcuts to DR since it requires extensive
planning. Here are a few pointers when starting out on the multi-site DR
route.
Disaster recovery should be workable and well documented
so that when you need it, you are sure that it will work. All the unforeseen
natural and man-made events that may hinder business flow and customer
confidence should be taken into consideration while designing and implementing
BCP and DR centres. The service level agreement should state clearly the
vendor role in support and services, including crisis situations.
Testing the IT infrastructure set up is as important
as any other aspect of business continuity and disaster recovery planning.
Scheduled practical tests and drills by the IT department should be undertaken.
The same should be monitored and reviewed by the CIO or the IT head of
the organisation. The drill should be redone and inspected for loopholes.
This may also be due to various Government imposed regulations that organisations
have to adhere to.
The primary data centre and business continuity centre
should be ideally mirrored over a distance of less than 100 km. This operation
is mostly synchronous and hence adds latency of about one millisecond
in to the application apart from any switching latency within the loop.
|
DR Outsourced
The other major trend coming up is evolution of remote data recovery. Remote
infrastructure solutions can seamlessly align with business continuity processes.
The same is offered irrespective of geographical location, place and time apart
from cost benefits.
Service providers can play a key role in effectively managing DR solutions outsourced
to them by the organisations. However they are not taken as seriously as is
required by the organisations. This is why the role of the service providers
could be and in fact should be higher when it comes to having a say.
However, for now it is largely confined to providing the bandwidth and connectivity
and in some cases data centre hosting. Overall, planning and implementation
of DR and framing of DR policies is still being done by the organisation itself
and not by service providers.
DR service providers are a good alternative arrangement for SMBs and enterprises
seeking a cost-effective solution for quick deployment of their DR site. This
may be also due to lack of technical expertise and manpower resource crunch
on their part.
Avijit Basu, Country Manager Marketing, Enterprise Servers and Storage, HP Technology
Solutions Group states, We have observed that outsourced DR implementation
is getting more popular in the SMB segment.
Outsourcing Disadvantages

Pronish Jain
|
There are probable downsides to the multi-tier DR approach.
As Pronish P Jain, Manager, Business Resiliency and Continuity Services, Global
Technology Services, IBM India points out, Multi-tier, multi-site DR does
have a lot of advantages for enterprises large and small. Although these setups
add complexity to the corporate IT infrastructure, they can't be neglected on
account of regulatory compliance pressures and disasters that have been coming
all too often of late
Issues on the DR front are no longer related to the availability of skill sets
or bandwidth or even cost. Some mid-sized organisations are still in the process
of consolidating their IT resources. Consolidation is a pre-requisite for an
effective DR plan.
Typically a DR service provider has already invested in the infrastructure and
is able to provide storage and connectivity to customer data centres. Though
there are other challenges like data security and questions like whether the
storage offered will be shared with the service providers other clients.
Wringing the last drop
There are several ways organisations strategise and deploy disaster recovery
architecture. In an FCIP-based solution the deployment can be baseline point-to-point,
multipoint, or with three sites.
Baseline point-to-point is the simplest FCIP topology. It is a point-to-point
connection between two sites. This is also the most common deployment scenario
for DR. In a point-to-point SAN distance extension scenario for DR, one site
usually has the primary or active data centre, and the
other is a secondary or recovery site.
Multipoint connectivity is for enterprises that are already utilising SAN distance
extension to connect two sites together and want to connect more sites. For
example, two active data centres may share a single recovery site, or three
active sites may use each other as recovery sites in an active, peer-to-peer
relationship.
The figure accompanying this article shows two ways in which three sites can
be connected together in a ring or mesh, and one example of a star topology
WAN in which eight remote sites connect to a central site. Similar configurations
can be applied to larger numbers of sites.
Gopal of Brocade feels, Three-site connectivity is
ideal for enterprises that have three sites located within a 20 kilometre radius
of each other, with an existing 10 Gigabit metro area network (MAN) between
them. Though the requirement is not limited to a DR solution, the requirement
here will also be to migrate volumes between the sites on a recurring basis;
a factor augmented by the fact that the organisation holds frequent departmental
reorganisation and shuffles personnel around.
If a disaster brings down all the three sites in the region, the organisation
will still have a recovery site located about 1,000 miles away. The three-site
set-up will be in addition to the existing process of shipping backup tapes
to a fourth site as a final line of defence.
- Automates processes and procedures to help reduce
the duration of planned events such as application testing and development,
data backup and system maintenance.
- Allows secondary sites to take over primary
processes and eliminates the scheduled time.
- Non-disruptive disaster recovery testing with
an online copy of current and accurate production data.
- Remote mirroring allows recovery of data from
a remote location. This can be achieved by provisioning various systems
and networking tools and equipments at the remote site.
|
Multi-Site DR Tweaks
Load balancing between data centres is yet another area of focus. Today load
balancing equipment allows enterprises to deploy global Internet and intranet
applications. Web application users will be quickly rerouted to a standby data
centre if a primary data centre outage or overload occurs.
It is worth eliminating propagation of local disruptions in one data centre
to other data centres. The same can be achieved with technologies such as Virtual
SAN (VSAN) and InterVSAN Routing (IVR), which help in accomplishing the same
at the SAN level.
VSAN technology partitions a single physical SAN into multiple VSANs. VSAN capabilities
allow a large physical fabric to be logically divided into separate, isolated
environments to improve FC SAN scalability, manageability and most prominently
network security.
Mukhija of Cisco explains, Each VSAN is a logically and functionally separate
SAN with its own set of FC fabric services. This partitioning of fabric services
greatly reduces network instability by containing fabric reconfigurations and
error conditions within an individual VSAN. The strict traffic segregation provided
by VSANs helps ensure that the control and data traffic of a given VSAN is confined
to its own domain, increasing SAN security.
VSAN benefits include reduced costs by facilitating consolidation of isolated
SAN islands into a common infrastructure without compromising on availability.
VSANs are supported across FCIP links between SANs, thus including devices at
remote locations.
The other technology worth mentioning is Internet Small Computer Systems Interface
(iSCSI), an alternative to transport layer protocols in TCP/IP. The protocol
enables the transport of blocks over an IP network. It operates on top of TCP
by encapsulating SCSI commands in TCP/IP stream. IP SAN is getting popular as
it offers multiple ports through iSCSI routers.
In addition to remote mirroring, remote clustering can also be used to keep
critical applications running when a disaster takes down a primary data centre.
This can be automated or in some cases manual failover can be utilised for long
distances.
Enterprises employing long haul WAN links can look forward to maximise the usable
bandwidth and overall use of those links. Compression is one way to maximise
throughput on WAN links.
DRs future
SAN and shared storage have had a huge impact. FCIP integration in SAN switches
for cost-effective SAN extension over existing MAN/WAN infrastructure up to
intercontinental distances is likely to be the next big thing.
Hardware-based compression, write acceleration and tape acceleration for improved
back-up/replication performance with integrated security and integrated encryption
of all data leaving the storage network will secure the last mile and further
complement enterprise business process continuance.
On the connectivity front there are upcoming technologies like Integrated Coarse
Wave Division Multiplexing and Dense Wavelength Division Multiplexing to support
metro optical WAN/MAN solutions. These technologies in conjunction with the
next generation of intelligence in these networks will definitely have a significant
impact on the long distance DR front.
|