Tape & backup: Speeding up archival
Enterprises across verticals are desperately seeking a shorter
back-up window to get their critical production applications back on stream
at the earliest. Meanwhile, security continues to be a major concern. A virtual
tape library consists of a virtual tape appliance and software, and is a potential
solution for the back-up window problem. Encryption during archival allows enterprises
to retain access to their data while providing full search and discovery capabilities.
Encryption is executed through an archiving appliance. Dominic K explores
the facets of these technologies
virtual tape library (VTL) offers a combination of tape back-up emulation software
along with a hard disc-based architecture. The solution is considered to be
among the superior archival solutions available currently. A VTL is faster,
more flexible, and more robust than tape back-up. It employs disc-to-disc (D2D)
back-up, and is hence also referred to as VTL D2D. VTL uses tape emulation software.
VTL generally consists of a virtual tape appliance or server,
and the software emulates traditional tape devices and formats. This lets it
offer compatibility to fit transparently into an existing tape back-up set-up.
The benefits of virtual tape systems include better back-up and recovery times,
and lower operating costs. Some other benefits include support for existing
back-up and archival software thats already been deployed at an enterprise
data centre. A virtual tape system abstracts the tape drives or the libraries
for the convenience of back-up applications. Devices and software on either
side of this middleware need to be supported. There are no firm standards currently
beyond the tape device command sets and SCSI. One attraction of the virtual
tape concept is that the enterprise need not change or deploy anything in its
||No library licencing costs
Lowers acquisition cost of storage technology
||Elimination of media and robotic errors
that prevent successful back-ups and restores
||Additional storage can be bolted on to
most VTL products as easily as with a modern SAN
||The ability to leverage the high transfer
rates of disc drives to accelerate back-up and restoration
Too much data
The amount of data archived is increasing exponentially day by day. As a result,
many companies are pursuing a tiered storage architecture to resolve issues
related to cost, performance and administration. Here, the first tier consists
of the primary storage. Its performance is high but the disc array to host current
active production data is expensive.
Disc-based back-up forms the second tier; this is usually a VTL. The primary
objective here is to provide speedy back-up and recovery speed at a lower cost
than that of primary storage. Tape-based back-up forms the third and final tier.
Usually, a tape library acts as the final repository, providing a cost-effective,
As Sunny John, Quantums Country Manager for India says, From the
perspective of archival, performance bottlenecks are eradicated in a tiered
storage architecture with the use of disc-based back-up in the second tier,
which acts as a buffer. The disc-based back-up system with its fast throughput
shortens back-up and recovery windows.
Performance bottlenecks apart, data archival depends on the archival software
used. Good archival software permits easy management of data that is to be archived,
and mitigates human and workflow bottlenecks.
Archived data is also sometimes referred to fixed-content data, i.e. it does
not change over time. Some examples of fixed-content are satellite images, medical
records, weather statistics, e-mail archives and annual / quarterly financial
results. These exhibit long-term preservation value, and can be retrieved in
the future as and when the need arises.
Informs Tan Kok Peng, the Technical Development Manager of Tandberg, The
major bottleneck in the archival process is to determine what can be archived.
Most archival software uses an access-based policy to archive data that has
not been accessed for a period of time. There are also some software applications
which support scheduled archival. The problem gets amplified when archived data
has to be transferred to an offsite storage repository which hinders the retrieval
process as it now depends on a network link.
The two biggest sources of bottlenecks are the performance of the archival target
and the network that carries the data, along with the speed with which the source
device can pump out data. According to Shailesh Agarwal, Country Manager, Storage,
IBM India, Tape is still considered to provide the best price-performance
for archival media. The network that is used to archive data can potentially
become a bottleneck if large amounts of data move through the network while
other mission-critical applications are utilising network bandwidth.
Proper scheduling of the archival process can help enterprises
avoid bottlenecks and save time. More often than not, the speed of the archival
source where the data is read from will not be the bottleneck. That said, it
is essential for CIOs to consider this factor as well.
Here are some of the best practices to be followed by CIOs to optimise
the performance of the archival system. One of the best ways to begin
is to decide what should be left out and not archived at all. Also, the
mechanism that the enterprise uses to manage its encryption and retention
policies has to be arrived at.
- Enterprise-friendly encryption model.
User-based encryption models should be avoided as far as possible. A
user-based encryption model is easy to adopt to begin with, but it has
its own drawback-s. Adoption of an enterprise-friendly encryption model
for archival is a better bet. The most streamlined way to manage archived
data will be to encrypt it right on the secondary storage.
- Rationalise. If the same item is archived from two locations,
then only one instance should be stored. Your solution should offer
single-instance storage. Every item stored should also be compressed.
- Categorise. If any additional meta data is available to describe
the document, then the deployed solution should ensure that this meta
data is also archived. For example, if an item has been classified as
'spam' with a meta data tag, then the 'spam' tag should also be retained
along with the item.
- Retention. Retention categories can be set for specific time
parameters, for example, to retain certain items for seven or eight
years based on defined enterprise policy, needs and parameters. Entire
sets of information, users or even specific mailbox folders should have
a specific category assigned to them. This will provide both broad control
and detailed granularity in defining how long information will be retained
by an enterprise.
- Indexing. One of the most critical activities is the full
text indexing of content. The deployed solution should offer indexing
of different document types. This will allow rapid document access during
- Auditing. Audit logging and reporting capabilities can help
enterprises when faced with litigation and in complying with regulations.
- Future roadmap should be chalked out. CIOs should have clear
long-term plans on various technical and business parameters, and requirements
for their enterprise vertical. If the enterprise is planning to retain
its contents for 30 years, then will vendors support their own products
at that date? Decisions with regards to archiving processes and technology
are on a different time-scale as compared to most of the other IT decisions.
- Administration. Access to the archived information is controlled
by access to specific archives. In the event that other individuals
need access to an archive, they can be granted permission to 'share'
- Expiration. Enterprises should be able to define the expiration
policies just as retention categories are defined.
VTL generally consists of a virtual
tape appliance or server, and the software emulates traditional tape devices
VTL can help meet the needs of a tiered storage environment.
Data archival depends on access-based or date-stamped-based policies, and the
volumes of data to be backed-up and the relative importance of it can be previewed
through any commercial archival software.
Alternatively, adding storage resources such as disc arrays
can also help handle burst requirements. Usually, good planning is the key,
more so when it comes to data archival and security. Deploying a mid-size tape
library of approximately 40 to 100 slots will be adequate for most burst scenarios.
Enterprises may also automate policy-based categorisation and archival of e-mail
data. Defining appropriate management of content coupled with appropriate tiers
of the storage sub-system will further address the increase in data volumes
and assist them during data retrieval.
Comments V S Manikkam, AGM, Information Technology, Henkel Technologies, We
had a bad experience with tape libraries, hence we switched to USB hard discs.
Post-migration, our data retrieval is faster and reliable. Tapes are unreliable,
and the performance credibility is too low. Tape is also costlier and the cost
involved is recurring. We have deployed automated processes through a written
SQL script. The script is transmitted via messages. It indicates that the back-up
process has been successfully executed.
|When data is being archived, enterprises should look
beyond just plain back-up. The focus should be on continuous availability.
The parameters should be defined more on objectives such as recovery point
and recovery time objective.
The difficulty while archiving is still the back-up window.
To solve this, enterprises are looking to solutions such as replication
and continuous data protection.
Replication is implemented using a fail-over server.
Back-ups of data from the critical server should be taken on the fail-over
server. In case the critical server fails, the failover server should
take over automatically.
Courtesy: Rajendra Dhavale, Consulting
Shrinking the back-up window
Assuming that the back-up window entails downtime for applications, one of the
easiest ways to reduce it is to take a point-in-time copy-based back-up. Using
multiple tape drives to transport data from source discs at a faster pace will
certainly help reduce the back-up window. If the back-up is performed on a SAN,
moving data via the SAN as against moving it over the LAN will also contribute
to shortening the back-up window.
Depending on the recovery point objective (RPO) and recovery time objective
(RTO) requirements, a snapshot-assisted back-up, open file back-up or one using
database APIs can be employed. One of the most important facts to ascertain
is the consistency of backed-up data so as to assure recovery in any case.
For effective data protection, enterprises should look at a single framework
and tool for managing both structured and unstructured data with an emphasis
not just on data recovery but on total system recovery as well. This includes
the operating system and various enterprise wide applications.
Informs Sunil Mehta, Senior Vice-president and Area Systems Director of JWT,
We have two audits every year. Back-up and restoration forms one of the
most important parts of the audit process. The physical media has to be in workable
condition since it is stored for years. In India, the tax regulators want most
of the corporate finance-related data to be archived for eight years, hence
I strongly advise enterprises across verticals to keep themselves updated on
new and emerging technologies in this regard.
Optimisation of storage resources required for back-ups can be achieved by backing-up
transient data more frequently than non-transient data. Advises Anand Naik,
Director, System Engineering, Symantec India and Saarc, Organisations
can also deploy a block-level incremental back-up mechanism which backs up only
the changed data blocks instead of the entire file, helps reduce the back-up
window, and also saves on space. Archival and retrieval solutions should be
able to offer tight integration with an organisations back-up and data
|One of the encryption and decryption services is
based on the Advanced Encryption Standard. The National Institute of Standards
and Technology agency of the US government publishes it. It allows enterprises
with VTL customers to encrypt virtual tapes and export them to physical
tapes. This will guard and secure the data against unauthorised access and
information theft in case of misplacement or loss during transportation.
Security: a necessary overhead
The overhead imposed by a security framework is due to the additional level
of checking involved. By properly isolating applications, network and data that
require stringent security measures from those that do not, one can reduce the
overall overhead imposed on the infrastructure. In addition, extra resources
such as CPU cycles and memory may be allocated to compensate for the overhead
imposed by encryption / decryption routines.
All data owners such as those who have been given access rights by administrators
will still have access to the data after its been archived. Archival software
does not manipulate the access rights of the original data, it just manages
In addition, data can be encrypted using archival software
or an encrypting device on removable storage media containing the archived data.
This needs to be transported around. It further fortifies the security measures
if the media ever lands up in the wrong hands.
|Add security to the data-sharing process without
interrupting existing processes
||Encryption, key management, authentication and key
sharing among business partners
|Secures record retention for
regulatory compliance and
intellectual property management
|Long-term key management and
the ability to accommodate technical changes while maintaining management
Encrypting archived data
Most available encryption technology used in archival allows enterprises to
retain access to their data while providing full search and discovery capabilities.
Encryption during back-up is executed by means of an archiving appliance that
has the encryption keys. This appliance does not retain the data that it encrypts;
it merely stores the encryption keys and encrypts information before it is sent
to secondary storage. The data is typically maintained in an encrypted form
on the network.
Messages are decrypted when an authorised user conducts search and discovery
using a Web-based user interface on the archiving appliance. The deployed solution
and its functions vary since they depend on the vendor.
The archival system deployed at a data centre is a huge repository. Data stored
here has to be preserved for a long time in a cost-effective manner to serve
multiple needs starting with business needs and ending with regulatory compliance
(think HIPAA and SOX).
with inputs from Aishwarya Ramani