|
Disaster Recovery/Business Continuity
Continuity in Digital Inc.
Even the slightest operational lapse in IT/ITeS outfits due
to calamities can cause irreparable damage. This is why the segment does
not like taking chances when it comes to being prepared for disasters. by Sneha
Khanna
Indian
IT/ITeS companies do not want to take any risk, so they are replicating their
data at various locations. All critical applications are being covered under
DR, and regular periodic reviews of DR/BC policies are conducted as dry runs.
Educating staff has become a regular feature with mock tests keeping the employees
ready for any situation.
Wipros on guard
Rajiv Gerela, General Manager, Technology, Wipro BPO
Solutions, on the DR practices and policies that keep the organisation fit to
fight all disasters
At Wipro BPO, there is separate planning for core services
like backbone LAN switch, IPLC links, routers, firewalls which consist mainly
of the standard set-up of dual systems, redundant power supply inputs, and multiple
leased lines from different providers. But for client-specific requirements
of seats, people, etc, the planning is done based on client requirements.
DR Drills
Periodic quarterly drills are conducted of simulated events to ensure that the
awareness levels of staff are maintained to act wisely during a disaster. Says
Gerela, We do quarterly drills where different events are announced to
see how people respond. The time from the announcement to full evacuation is
measured to evaluate the effectiveness of the staff.
During evacuation, employees gather at common assembly points where awareness
sessions are conducted. Sessions involving use of fire-fighting equipment (like
fire extinguishers) are demonstrated to all with practical lessons. Different
members are made to handle the equipment to ensure that there is adequate learning,
he adds.
Redundancy Checked
DR planning levels are put in place based on different requirements. While planning,
total redundancy is considered for equipment such as routers, firewalls, backbone
switches, servers, UPS systems and diesel generator (DG) sets. Says Gerela,
Testing is done at least twice a year to ensure that such redundancy is
actually functioning as designed. DR planning for the customers
requirements undergoes the same testing.
According to Gerela, The plans could include shifting people from one
location to another, and transferring all new calls and workload to this new
location. Thus, full site-level redundancy gets checked. The response in communication,
people movement, system readiness and reporting requirements are thus effectively
tested at the alternate site.
Customer Requirements First
DR infrastructure planning is based on internal and customer requirements. Every
site has standard equipment like dual firewalls, backbone switches, servers,
UPS and DG sets. We work closely with vendors to try and get as many redundant
set-ups as possible in place, informs Gerela.
Some of the areas for redundancy planning are:
- Desktop-level standby arrangement.
- Server standby and dual supply, RAID levels.
- UPS (Parallel mode and 150 percent over-capacity
planning vis-a-vis the actual requirement).
- Power comes from dual sources to each critical area.
Additional 60/80 KVA to server rooms, earthing, and grounding are provided.
- Gensets are given 100 percent over-capacity planning.
- Chillers (centralised and additional AC) are given
provisioning from alternate sources.
- Passive networking (dual-homed network with no single
point of failure for both copper and fibre).
- Active components: campus network providing building-level
redundancies.
- Routers/firewalls: Dual planning.
- Voice infrastructure: High availability configuration
spread across sites.
- Telecom infrastructure: Multiple service providers
and redundant muxes.
Strict Reviews
Wipro BPO has a documented DR plan in place with periodical reviews held with
all stakeholders. Testing of all components is held regularly, ensuring hassle-free
functioning. Documents are updated on a regular basis with internal/external
contact lists. Some of the regular initiatives include facing a crisis
as a team which is truly ready for the worst storm. The tests are performed
on a regular basis, Gerela says.
DG/UPSs are tested every month, and equipment/DR gets tested twice a year. There
is a practice of internal and external audits as part of the BS 7799 audit requirements
validating such tests. Apart from these, we have regular client team audits
which also check our readiness, says Gerela.
Potential Risks/Threats
|
We do quarterly drills where different events are simulated
to see how people respond. The time from announcement to full evacuation
is measured to evaluate the effectiveness of the staff
|
Assessments at various levels are done to ensure that all
likely scenarios of potential threats and risks are covered. As a basis,
the basic DR document covers most cases such as floods, strikes, fire and earthquakes,
continues Gerela.
Quarterly tabletop exercise checks are carried out, and any new learning or
areas of risk which have come up are included and updated in the DR document.
In the end, it is the people involved in any incident whose response is
essential, as was the case in the Mumbai floods which were unprecedented. But
our regular tests ensured that people adapted very well to unknown situations
due to the learnings that they carry with them, comments Gerela.
Putting Customer Requirements First
Customer requirements based on business objectives determine the planning for
all types of DR and process-specific DR. Any set-up requires investment
both from our end and the clients end, so the set-up put in operation
is based on mutual discussions. We could provide hot seats in secondary sites
where people work in the normal set-up. During an emergency, at the other site,
additional staff is deployed to take on the extra load. Or we could have a cold
set-up where people are moved to this new site which can be made ready for usage
based on requirements, says Gerela.
A well-documented DR plan, periodic reviews with all stakeholders, regular testing
of all components, periodic update of all documents, and updated internal/external
contact lists paid off at the time of the Mumbai floods.
During the floods, the whole plan underwent an effective test
as things worked in a synchronised fashion among all departments. From 4 pm
on that day, regular checks were conducted, meetings with various teams were
organised, people were advised not to venture out, customers were updated on
a regular basis, and communication via phone or e-mail was carried out on a
two-hour basis. Food, toiletries and the like were arrangeda large stock
of this is always kept on the site for such emergenciesand thus the crisis
was handled in a very effective manner, recalls Gerela.
In troubled times
Michael Martin, Associate Vice-president,
Technologies, MphasiS, talks about how staying up-to-date helps recover from
disasters
At MphasiS, DR is used to protect all critical applications.
These include applications such as ERP, messaging and directory services, file
servers, critical business applications, Verint, Blue Pumpkin, VSS servers,
SharePoint Portal Servers and critical financial servers.
MphasiS uses a 45 Mbps IPLC for its call centre operations
to ensure high performance. There is a centralised set-up for DR management.
We implement solutions and DR functions for all critical applications.
These are controlled and managed by the team which monitors and performs assessments
on a regular basis, says Martin.
Replication is done right from the development site to the
core site, and then from core site to offsite on a daily basis using IP SANs
(Intransa IP 5000) for replication. This is done across MphasiS seven
sitesfour in Bangalore, one in Mangalore, one in Pune, one in Mumbai,
while one more site is being set up in China. All the data is being replicated
first from the development centres and thereon to the core centres, says
Martin.
MphasiS is able to switch over the data within 30 minutes
from one site to another.
Regular Testing
The organisation has a common core team for DR and BC. This
team works from MphasiS global network operating centre. Weekly visits
are performed on each site and tests are conducted on backup efficiency (random
and monthly checks). Ten dry runs are conducted every month by migrating data
from one centre to another. Once the migration is done, operations are performed
from there. This is done once a month for all the sites that MphasiS operates
in India. We use SAN management, and reports are being sent on a regular
basis to the management, says Martin.
Training sessions are held on a quarterly basis where sessions
are undertaken for the GNOC (Global Network Operation Centres). Operation teams
are trained on management of core infrastructure. A half-yearly assessment is
also done with the stakeholders.
Reports are sent monthly. Documented DR policies involve
compliance, business excellence, information security and technology,
says Martin.
Fully Prepared
Risks can come from anywhere, Martin points out.
For all the products, there is 100 percent redundancy with no single point
of failure. Even if you pull out a network cable there will be 100 percent redundancy.
Come Rain Or Storm
Being able to recover from any disaster that has ever struck
them has ensured that MphasiS always stays ahead in the race even during the
worst. Whenever disaster strikes, recovery starts within 30 minutes,
informs Martin.
During 26/7, MphasiS switched over from its centre in Mumbai
within 30 minutes and began operating from Pune. Operations continued even during
the heaviest rains that Mumbai has ever seen.
khannasneha@networkmagazineindia.com
|