Archives || Search || About Us || Advertise || Feedback || Subscribe-
-
Issue of September 2006 
-

[an error occurred while processing this directive]

  -  
 
 Home > Cover Story
 Print Friendly Page ||  Email this story

Disaster Recovery

Damage control, the multi-site way

3DR matters apart, India Inc uses multi-site DR in myriad ways to ensure business continuity. Dominic K explores in detail what makes this concept work

Continuous Data Protection, Information Lifecycle Management, and other assorted technologies and methodologies which are meant for data security will not suffice unless backed by a redundant IP-based multi-tier multi-site disaster recovery architecture. That is why multi-tier DR architectures similar to the one at Indian Oil Corporation Limited (IOCL) are popular among enterprises.

In the case of IOCL, site one is referred to as the primary site followed by a secondary site called the business continuity centre and finally the third tier which is usually based in a different city or state called the disaster recovery centre.

Multi-Site, Multi-Tier Soars High

During the last year (2005-06) as well as the current year, most Indian organisations across verticals have made it a point to deploy DR while consolidating their data centres. This has been achieved using multi-site DR, a trend which is noticeable in the BFSI and manufacturing sectors.

Today DR deployment is multi-site and multi-city with Mumbai-Bangalore, Mumbai-Chennai and Delhi-Chennai being some popular combinations. Shyam Gopal, Country Head, Brocade elaborates, “It can also be within the same region like Delhi–Noida or Mumbai–Navi Mumbai. The trend currently among enterprises across verticals and sectors is to opt for multi-site and multi-tier DR. No organisation can afford to lose out on data, especially in the BFSI space.”

Multi-site DR is an evolving process and it will take some time before similar set-ups are deployed across organisations. The factors influencing these are technologies such as Fibre Channel over Internet Protocol (FCIP) and availability of lower cost dedicated WAN links between cities.

Telecom service providers have connected the big cities with high speed fibre for last mile connectivity. This is why multi-site deployment addresses the critical needs for data and link recovery from regional disasters.

The Rationale

To have an effective DR setup, the sites or locations should be separated by great distances. There is no point in copying data between sites if they are close to each other since both sites may be wiped out by the same disaster. The other driving force is stringent government regulations.More organisations today are coming under the purview of some regulation or the other

To have an effective DR setup, the sites or locations should be separated by great distances. There is no point in copying data between sites if they are close to each other since both sites may be wiped out by the same disaster. The other driving force is stringent government regulations. More organisations today are coming under the purview of some regulation or the other. This makes it necessary for them to retain data records for a longer duration but puts the onus on them to make the data records available as and when required.


Sumit
Mukhija

As Sumit Mukhija, Business Development Manager, Cisco Systems India & SAARC states, “The BFSI vertical and service providers are most affected by regulations. For these organisations, data is their
bread and butter. Therefore, the first ones to adopt multi-city DR (primary, business continuity and secondary sites) have been banks and service providers.”

Gopal of Brocade has a similar view, “Indeed, regulatory requirements are mandating this category of solutions for many industries. All solutions in this category must be able to move large amounts of data in a reliable and repeatable manner. As a result, they are generally built on top of a Fibre Channel SAN infrastructure.”

This is mostly achieved by extending a SAN over hundreds of kilometres, or across a continent. To be effective over such distances, it is sometimes necessary to use an IP network to transport the FC SAN data. The standard for doing this is a protocol called FCIP, which allows transparent tunnelling of FC switch-to-switch links across IP networks.

Multi-Site DR Routes

The current multi-site DR approach is to have a primary site, a near-line site (also called a business continuity site) and a secondary or DR site. The primary and the near-line sites are usually in the same city or even in the same campus. The secondary site or the DR site is in a different region. In extreme cases it could even be in a different continent.

Copies are replicated at the BC site from the primary data centre at regular intervals. This cycle time represents the Recovery Point Objective (RPO) that could be achieved.

RPO represents the point in time to which data must be restored in order to resume processing transactions. Recovery Time Objective (RTO) is the period of time after an outage in which the systems and data must be restored to a predetermined RPO.

The IFCP vs FCIP debate
Internet Fibre Channel Protocol (iFCP) is a TCP/IP-based protocol for interconnecting FC storage devices or FC SANs using an IP infrastructure in conjunction with, or in place of FC switching and routing elements.

iFCP was initially targeted to replace the core of a FC SAN with IP, but it didn’t catch up as expected due to the introduction of iSCSI as an end-to-end IP SAN technology. Although iFCP has been in existence around for quite some time now, it has not witnessed significant adoption.

iFCP-based products need a large amount of hardware level buffering. This is because new TCP connections are created between each device pair that communicates across the service. The TCP processing overhead grows rapidly as more devices use the iFCP gateway service. Also, management and troubleshooting gets complex with multiple connections.

In comparison, FCIP is a widely accepted SAN extension technology. FC was designed for local SANs, but FCIP extends the distance to remote locations via any IP network.

VSANs, IVR, and FCIP all use the same set of FC routing as well as alert and response mechanisms. This is why FCIP with its wider acceptance across vendors and enterprises, large installed base and its availability to use FC fabric routing and load balancing with support for FC Security protocols is here to stay.

Criticality Determines Mode of Replication


Sudhakar
Rao

Is remote replication for recovery and business continuity all about shipping data over an IP network? If the organisation cannot tolerate any data loss and its operations must resume instantly following an outage or unforeseen event, then synchronous replication will be an ideal choice. The decision must also factor in how far data has to be replicated and the extent to which application performance degradation can be tolerated.

On the flip side, if the organisation can tolerate some degree of downtime while the last few transactions are reconstructed or cannot tolerate the performance impact of synchronous propagation delays then asynchronous replication may be a cost-effective option.

Remote tape backups have not been very popular in the past. However, availability of fast and inexpensive WAN links and features such as FCIP tape acceleration have significantly improved throughput over WAN links for remote tape backup. It is making remote tape backup feasible. Economical long distance connectivity has come as a bonus for enterprises.

The Multi-Site Lookout Points
There are no shortcuts to DR since it requires extensive planning. Here are a few pointers when starting out on the multi-site DR route.

Disaster recovery should be workable and well documented so that when you need it, you are sure that it will work. All the unforeseen natural and man-made events that may hinder business flow and customer confidence should be taken into consideration while designing and implementing BCP and DR centres. The service level agreement should state clearly the vendor role in support and services, including crisis situations.

Testing the IT infrastructure set up is as important as any other aspect of business continuity and disaster recovery planning. Scheduled practical tests and drills by the IT department should be undertaken. The same should be monitored and reviewed by the CIO or the IT head of the organisation. The drill should be redone and inspected for loopholes. This may also be due to various Government imposed regulations that organisations have to adhere to.

The primary data centre and business continuity centre should be ideally mirrored over a distance of less than 100 km. This operation is mostly synchronous and hence adds latency of about one millisecond in to the application apart from any switching latency within the loop.

DR Outsourced

The other major trend coming up is evolution of remote data recovery. Remote infrastructure solutions can seamlessly align with business continuity processes. The same is offered irrespective of geographical location, place and time apart from cost benefits.

Service providers can play a key role in effectively managing DR solutions outsourced to them by the organisations. However they are not taken as seriously as is required by the organisations. This is why the role of the service providers could be and in fact should be higher when it comes to having a say.

However, for now it is largely confined to providing the bandwidth and connectivity and in some cases data centre hosting. Overall, planning and implementation of DR and framing of DR policies is still being done by the organisation itself and not by service providers.

DR service providers are a good alternative arrangement for SMBs and enterprises seeking a cost-effective solution for quick deployment of their DR site. This may be also due to lack of technical expertise and manpower resource crunch on their part.

Avijit Basu, Country Manager Marketing, Enterprise Servers and Storage, HP Technology Solutions Group states, “We have observed that outsourced DR implementation is getting more popular in the SMB segment.”

Outsourcing Disadvantages


Pronish Jain

There are probable downsides to the multi-tier DR approach. As Pronish P Jain, Manager, Business Resiliency and Continuity Services, Global Technology Services, IBM India points out, “Multi-tier, multi-site DR does have a lot of advantages for enterprises large and small. Although these setups add complexity to the corporate IT infrastructure, they can't be neglected on account of regulatory compliance pressures and disasters that have been coming all too often of late”

Issues on the DR front are no longer related to the availability of skill sets or bandwidth or even cost. Some mid-sized organisations are still in the process of consolidating their IT resources. Consolidation is a pre-requisite for an effective DR plan.

Typically a DR service provider has already invested in the infrastructure and is able to provide storage and connectivity to customer data centres. Though there are other challenges like data security and questions like whether the storage offered will be shared with the service provider’s other clients.

Wringing the last drop

There are several ways organisations strategise and deploy disaster recovery architecture. In an FCIP-based solution the deployment can be baseline point-to-point, multipoint, or with three sites.

Baseline point-to-point is the simplest FCIP topology. It is a point-to-point connection between two sites. This is also the most common deployment scenario for DR. In a point-to-point SAN distance extension scenario for DR, one site usually has the “primary” or “active” data centre, and the other is a “secondary” or “recovery” site.

Multipoint connectivity is for enterprises that are already utilising SAN distance extension to connect two sites together and want to connect more sites. For example, two active data centres may share a single recovery site, or three active sites may use each other as recovery sites in an active, peer-to-peer relationship.

The figure accompanying this article shows two ways in which three sites can be connected together in a ring or mesh, and one example of a star topology WAN in which eight remote sites connect to a central site. Similar configurations can be applied to larger numbers of sites.

Gopal of Brocade feels, “Three-site connectivity is ideal for enterprises that have three sites located within a 20 kilometre radius of each other, with an existing 10 Gigabit metro area network (MAN) between them. Though the requirement is not limited to a DR solution, the requirement here will also be to migrate volumes between the sites on a recurring basis; a factor augmented by the fact that the organisation holds frequent departmental reorganisation and shuffles personnel around.”

If a disaster brings down all the three sites in the region, the organisation will still have a recovery site located about 1,000 miles away. The three-site set-up will be in addition to the existing process of shipping backup tapes to a fourth site as a final line of defence.

Remote Data Replication: Benefits
  • Automates processes and procedures to help reduce the duration of planned events such as application testing and development, data backup and system maintenance.
  • Allows secondary sites to take over primary processes and eliminates the scheduled time.
  • Non-disruptive disaster recovery testing with an online copy of current and accurate production data.
  • Remote mirroring allows recovery of data from a remote location. This can be achieved by provisioning various systems and networking tools and equipments at the remote site.

Multi-Site DR Tweaks

Load balancing between data centres is yet another area of focus. Today load balancing equipment allows enterprises to deploy global Internet and intranet applications. Web application users will be quickly rerouted to a standby data centre if a primary data centre outage or overload occurs.

It is worth eliminating propagation of local disruptions in one data centre to other data centres. The same can be achieved with technologies such as Virtual SAN (VSAN) and InterVSAN Routing (IVR), which help in accomplishing the same at the SAN level.

VSAN technology partitions a single physical SAN into multiple VSANs. VSAN capabilities allow a large physical fabric to be logically divided into separate, isolated environments to improve FC SAN scalability, manageability and most prominently network security.

Mukhija of Cisco explains, “Each VSAN is a logically and functionally separate SAN with its own set of FC fabric services. This partitioning of fabric services greatly reduces network instability by containing fabric reconfigurations and error conditions within an individual VSAN. The strict traffic segregation provided by VSANs helps ensure that the control and data traffic of a given VSAN is confined to its own domain, increasing SAN security.”

VSAN benefits include reduced costs by facilitating consolidation of isolated SAN islands into a common infrastructure without compromising on availability. VSANs are supported across FCIP links between SANs, thus including devices at remote locations.

The other technology worth mentioning is Internet Small Computer Systems Interface (iSCSI), an alternative to transport layer protocols in TCP/IP. The protocol enables the transport of blocks over an IP network. It operates on top of TCP by encapsulating SCSI commands in TCP/IP stream. IP SAN is getting popular as it offers multiple ports through iSCSI routers.

In addition to remote mirroring, remote clustering can also be used to keep critical applications running when a disaster takes down a primary data centre. This can be automated or in some cases manual failover can be utilised for long distances.

Enterprises employing long haul WAN links can look forward to maximise the usable bandwidth and overall use of those links. Compression is one way to maximise throughput on WAN links.

DR’s future

SAN and shared storage have had a huge impact. FCIP integration in SAN switches for cost-effective SAN extension over existing MAN/WAN infrastructure up to intercontinental distances is likely to be the next big thing.

Hardware-based compression, write acceleration and tape acceleration for improved back-up/replication performance with integrated security and integrated encryption of all data leaving the storage network will secure the last mile and further complement enterprise business process continuance.

On the connectivity front there are upcoming technologies like Integrated Coarse Wave Division Multiplexing and Dense Wavelength Division Multiplexing to support metro optical WAN/MAN solutions. These technologies in conjunction with the next generation of intelligence in these networks will definitely have a significant impact on the long distance DR front.

 
     
- <Back to Top>-  
Untitled Document
 
Indian Express - Business Publications Division

Copyright 2001: Indian Express Newspapers (Mumbai) Limited (Mumbai, India). All rights reserved throughout the world. This entire site is compiled in Mumbai by the Business Publications Division (BPD) of the Indian Express Newspapers (Mumbai) Limited. Site managed by BPD.