Disaster Recovery/Business Continuity
The factory as a fortress
Manufacturing has been one of the first verticals to adopt
enterprise applications such as ERP and SCM. Naturally, disaster recovery and
business continuity are top priorities on the sectors agenda. That is
why most manufacturers have started to invest heavily in DR/BC plans, says Sneha
business continuity planning is a buzzword in the manufacturing sector, its
largely because many of the forerunners in this field have burnt their fingers
in earlier run-ins with disaster.
It is interesting to note that many manufacturers have learned from their bitter
experiences and used those lessons effectively. Here we take a look at some
of the companies like BPCL, HLL, Godrej Industries and Hindustan Lever which
have put comprehensive DR/BC strategies in place to protect their operations.
Here we examine case 1, and Mani Mulki of Godrej Industries
discusses the DR/BC strategies which helped the organisation survive even as
many of its locations in Mumbai were flooded on July 26, 2005.
BC: More Than IT
The organisations business continuity plan is divided into two parts.
The first part looks at the non-technical side of business continuity such as
suppliers and distributors. For example, Godrejs business depends on suppliers
who provide vital ingredients for its business. The BCP involved here is to
find alternatives in case the supplier says that it does not want to work with
the company anymore, or if it goes out of business. In that case, other suppliers
are identified and materials are sourced from them after checking whether they
match the companys quality specifications.
The other part of BCP is technology-related. Here, all the major processes or
areas critical to the companys day-to-day operations (such as logistics,
supply of key items, IT and despatches) are included in the DR set-up. This
is critical since their disruption can bring the business to its knees.
Reliance Data Centre
Billing takes place all over the country at Godrej. To avoid the scenario of
billing not being possible from a particular Carry and Forwarding Agent (CFA)
who has been affected, backups are done on a daily basis. For this, an authorised
person takes the call that billing is affected and that it must be done from
another location. The alternative procedure entails an elaborate recovery plan.
This plan first identifies the disaster levels and then guides the emergency
alternative procedures. Godrej uses the Reliance data centre in Navi Mumbai
A joint team consisting of personnel from corporate audit, assurance and IT
looks after DR. The team visits locations, creates awareness and conducts educational
sessions to inform staff of the risks involved. Topics covered include vulnerable
areas, how certain processes run in times of disaster, their criticality, and
their impact on the business.
The team disseminates the Standard Operating Procedures to be implemented (in
case of a disaster) as laid down by the company. Some of the areas covered include
how to take a call and whether there is an alternative emergency procedure which
should be adopted. There is an elaborate recovery plan which firstly identifies
the disaster levels. Emergency personnel are also trained in how to return to
normalcy when the disaster abates.
So the moment a disaster strikes, there is a champion who is nominated
by the SOP to take a call and identify the disaster, and put emergency plans
into action as per the intensity of the disaster, explains Mulki.
Mock tests are conducted on a yearly basis. This includes two types of mock
The first is the informed mock test where a site is informed that there are
going to be DRP drills on a specified date. This is done by a team consisting
of the corporate audit and IT departments. The team goes to a particular location
and brings the required infrastructure to a standstill. This helps them see
the emergency procedures in place to ensure that business operation does not
come to a halt. All this is reviewed and a report is prepared.
The other is a surprise mock test where the team goes unannounced to a particular
location and conducts drills to see how they are managing. There is a periodical
summary report which goes to the business head.
For identifying potential threats, an evolved and detailed risk management assessment
is done regularly. The processes and stringency depends on the risk involved.
Mulki says that risk assessment was performed before the DR plan was prepared.
At present Godrej has a warm site. It has operations at 45 locations. These
locations send back a database instance to the servers hosted at the Reliance
data centre at the end of each day; this ensures that data is replicated. Backups
of the ERP database for a particular location are preserved at another location
on a weekly basis.
The most important part of Godrej's
business is goods shipment. This was not affected as Godrej's key processes
were hosted at the Reliance data centre which was safe from flooding
The most important part of Godrejs business is goods
shipment. This was not affected as Godrejs key processes were hosted at
the Reliance data centre which was safe from flooding. So even when the land
was flooded, the data was accessible to users.
Since the Internet was working, people could still take stock of the situation
and give instructions. Even though Godrejs warehouses and CFAs located
in Bombay were flooded, it was still possible to make decisions as the data
was stored in the central server which was accessible from anywhere. Business
operations continued without a hitch while many other organisations stopped
BPCLs 2+1 plan
D Agrawal, Chief Manager, IS, Refinery System, BPCL, explains the DR/BC
strategies followed at their Mahul refinery
On the technology front, we are using a strategy of two-plus-one i.e. having
a data centre within the refinery itself, a secondary data centre, and storing
backups at a remote site.
BPCL has divided its servers into two categoriescritical and non-critical.
There are 47 servers out of which 20 are critical. If a critical server goes
down it is switched over at the backup.
Backup at BPCL is divided into two partsat the site and at a remote site.
So if a primary server is down at the site, users are handed over to a secondary
server. If this one also is down, data is accessed from a remote site.
While non-critical applications are given varying levels of priority, critical
ones are hosted at the primary data centre. BPCL has its corporate data centre
at the companys Ballard Estate (in Mumbai) site. At BPCL we have
a centralised backup system, and daily backups are takenincremental as
well as full, says Agrawal. BPCL is in the process of implementing COBIT
practices to improve its operational efficiencies.
At the corporate level there is a DR site. For their mission-critical ERP, BPCL
has a corporate data site at Noida. BPCL has 100 percent redundancy in
the campus network through the use of VLAN. Base network segment technology
is used along with Layer 3 switching for the terrestrial network, informs
Power Of The Mesh
The oil major relies on mesh networks for higher redundancy. For example, if
the terrestrial network is down, the wireless network (VSAT and radio links)
The organisation relies on leased lines for primary connectivity. In areas where
there are no leased lines, the primary link is VSAT with a radio link as the
backup for each leased line.
BPCLs Mahul refinery at Mumbai has four-way connections for link redundancy.
One is through leased lines (4 Mbps). When these lines are down, BPCL switches
over to radio links (1 Mbps). If the radio links are also down, ISDN steps into
Microsoft stretch cluster and replication software are utilised. Storage consolidation
is achieved by means of SANs located at two data centres. Each SAN is connected
to a server cluster.
The company has laid down procedures for various disaster scenarios which decide
the plan of action. For example, there are clearly laid down procedures for
situations ranging from when one server goes down or when the entire set-up
The refinerys departments take care of electrical and AC maintenance.
At the data centre there is a backup electricity supply. For AC and electricity,
BPCL conducts annual checks. Reviews are also regularly undertaken with well-defined
procedures on a departmental basis. BPCL conducts dry runs (for testing individual
servers) every three to six months.
There are clearly defined documents that detail handling of a disaster. BPCL
conducts mock drills once a year. In this the sites electric supply is
brought down and operations are run from the secondary site. Departmental meetings
with the staff are held once in two months. A risk management study is conducted
every three years.
The 26/7 Story
BPCL found itself lucky during the Mumbai floods as there
was not much flooding at its corporate office in Mumbai. The organisations
network and IT systems were not affected in any manner.
The HLL saga
Narayanan, Corporate Information Security Manager, Hindustan Lever, on the
DR/BC strategies that keep the FMCG giant on the movealways
Hindustan Lever Limited (HLL) is a good example of how technical DR mechanisms
and effective BC policies can power an organisation in tough times. According
to S Narayanan, the companys Corporate Information Security Manager, this
has been a result of years of effort and leveraging on the lessons learnt from
its many experiences with disasters over the years.
It has three outsourced data centres at Mumbai, Bangalore, and Gurgaon. The
Mumbai and Bangalore data centres are hosted with Reliance Infocomm.
The organisation used to have a decentralised DRP before a series of disasters
led it to consider the need for a centralised architecture. Till this point
of time each of the units did its own DRP.
Earlier, HLL had hot sites at four metros with each unit doing its own DRP testing.
Even at that time, it conducted yearly IT operational risk assessments. After
the disasters that befell it, the company moved to a centralised set-up.
Classification of servers is done based on the business criticality. Support
tools are also in place for DRP monitoring.
Earlier, HLL had a decentralised DRP architecture. It has since shifted to a
centralised approach. DRP is done from the unit level to the three data centres
at Bangalore, Gurgaon and Mumbai. We can respond to any disaster situation
within 15 minutes, states Narayanan.
He says that the use of centralised communication links has made DRP more reliable.
These consist of the VSAT network from HECL with Gurgaon as the first hub connecting
around 180 locations. The network also consists of terrestrial links (about
90) across the country backed by ISDN links to cover Indian offices. Network
redundancy is achieved through triangulation.
HLLs application-level DRP strategy is to have the application hosted
in not less than two cities. Some critical servers are hosted at HECL, Gurgaon,
since the transaction speed is faster. There is one live location and one DR
location. DR strategies for all critical applications are in place. Incremental
backups are performed at specified frequencies. The risk management is done
internally at Bangalore.
Emergency response procedures for DR are in place. Procedures are defined for
different categories of disasters. The organisation decides on suitable DR/BC
policies based on the risks involved.
This is followed by periodic education and audits. Auditing, monitoring and
compliance reviews are done every quarter, as is ongoing compliance-monitoring.
Reports are generated for DRP monitoring.
User training is also done on a periodic basis. We educate everybody in
the organisation. Periodic tests are held, feedback taken, and corrections done,