Archives || Search || About Us || Advertise || Feedback || Subscribe-
-
Issue of December 2006 
-

[an error occurred while processing this directive]

  -  
 
 Home > Cover Story
 Print Friendly Page ||  Email this story

Telescope 2007

Tape & backup: Speeding up archival

Enterprises across verticals are desperately seeking a shorter back-up window to get their critical production applications back on stream at the earliest. Meanwhile, security continues to be a major concern. A virtual tape library consists of a virtual tape appliance and software, and is a potential solution for the back-up window problem. Encryption during archival allows enterprises to retain access to their data while providing full search and discovery capabilities. Encryption is executed through an archiving appliance. Dominic K explores the facets of these technologies

A virtual tape library (VTL) offers a combination of tape back-up emulation software along with a hard disc-based architecture. The solution is considered to be among the superior archival solutions available currently. A VTL is faster, more flexible, and more robust than tape back-up. It employs disc-to-disc (D2D) back-up, and is hence also referred to as VTL D2D. VTL uses tape emulation software.

Virtualising tape

VTL generally consists of a virtual tape appliance or server, and the software emulates traditional tape devices and formats. This lets it offer compatibility to fit transparently into an existing tape back-up set-up. The benefits of virtual tape systems include better back-up and recovery times, and lower operating costs. Some other benefits include support for existing back-up and archival software that’s already been deployed at an enterprise data centre. A virtual tape system abstracts the tape drives or the libraries for the convenience of back-up applications. Devices and software on either side of this middleware need to be supported. There are no firm standards currently beyond the tape device command sets and SCSI. One attraction of the virtual tape concept is that the enterprise need not change or deploy anything in its existing infrastructure.

Advantages of Virtual Tape Libraries
Parameters Briefing
Cost savings No library licencing costs
Lowers acquisition cost of storage technology
Reliability Elimination of media and robotic errors that prevent successful back-ups and restores
Expandability Additional storage can be bolted on to most VTL products as easily as with a modern SAN
Performance The ability to leverage the high transfer rates of disc drives to accelerate back-up and restoration

Too much data

The amount of data archived is increasing exponentially day by day. As a result, many companies are pursuing a tiered storage architecture to resolve issues related to cost, performance and administration. Here, the first tier consists of the primary storage. Its performance is high but the disc array to host current active production data is expensive.

Disc-based back-up forms the second tier; this is usually a VTL. The primary objective here is to provide speedy back-up and recovery speed at a lower cost than that of primary storage. Tape-based back-up forms the third and final tier. Usually, a tape library acts as the final repository, providing a cost-effective, long-term solution.

As Sunny John, Quantum’s Country Manager for India says, “From the perspective of archival, performance bottlenecks are eradicated in a tiered storage architecture with the use of disc-based back-up in the second tier, which acts as a buffer. The disc-based back-up system with its fast throughput shortens back-up and recovery windows.”

Performance bottlenecks apart, data archival depends on the archival software used. Good archival software permits easy management of data that is to be archived, and mitigates human and workflow bottlenecks.

Archived data is also sometimes referred to fixed-content data, i.e. it does not change over time. Some examples of fixed-content are satellite images, medical records, weather statistics, e-mail archives and annual / quarterly financial results. These exhibit long-term preservation value, and can be retrieved in the future as and when the need arises.

Informs Tan Kok Peng, the Technical Development Manager of Tandberg, “The major bottleneck in the archival process is to determine what can be archived. Most archival software uses an access-based policy to archive data that has not been accessed for a period of time. There are also some software applications which support scheduled archival. The problem gets amplified when archived data has to be transferred to an offsite storage repository which hinders the retrieval process as it now depends on a network link.”

The two biggest sources of bottlenecks are the performance of the archival target and the network that carries the data, along with the speed with which the source device can pump out data. According to Shailesh Agarwal, Country Manager, Storage, IBM India, “Tape is still considered to provide the best price-performance for archival media. The network that is used to archive data can potentially become a bottleneck if large amounts of data move through the network while other mission-critical applications are utilising network bandwidth.”

Proper scheduling of the archival process can help enterprises avoid bottlenecks and save time. More often than not, the speed of the archival source where the data is read from will not be the bottleneck. That said, it is essential for CIOs to consider this factor as well.

Best Practices

Here are some of the best practices to be followed by CIOs to optimise the performance of the archival system. One of the best ways to begin is to decide what should be left out and not archived at all. Also, the mechanism that the enterprise uses to manage its encryption and retention policies has to be arrived at.

  • Enterprise-friendly encryption model. User-based encryption models should be avoided as far as possible. A user-based encryption model is easy to adopt to begin with, but it has its own drawback-s. Adoption of an enterprise-friendly encryption model for archival is a better bet. The most streamlined way to manage archived data will be to encrypt it right on the secondary storage.
  • Rationalise. If the same item is archived from two locations, then only one instance should be stored. Your solution should offer single-instance storage. Every item stored should also be compressed.
  • Categorise. If any additional meta data is available to describe the document, then the deployed solution should ensure that this meta data is also archived. For example, if an item has been classified as 'spam' with a meta data tag, then the 'spam' tag should also be retained along with the item.
  • Retention. Retention categories can be set for specific time parameters, for example, to retain certain items for seven or eight years based on defined enterprise policy, needs and parameters. Entire sets of information, users or even specific mailbox folders should have a specific category assigned to them. This will provide both broad control and detailed granularity in defining how long information will be retained by an enterprise.
  • Indexing. One of the most critical activities is the full text indexing of content. The deployed solution should offer indexing of different document types. This will allow rapid document access during searches.
  • Auditing. Audit logging and reporting capabilities can help enterprises when faced with litigation and in complying with regulations.
  • Future roadmap should be chalked out. CIOs should have clear long-term plans on various technical and business parameters, and requirements for their enterprise vertical. If the enterprise is planning to retain its contents for 30 years, then will vendors support their own products at that date? Decisions with regards to archiving processes and technology are on a different time-scale as compared to most of the other IT decisions.
  • Administration. Access to the archived information is controlled by access to specific archives. In the event that other individuals need access to an archive, they can be granted permission to 'share' access.
  • Expiration. Enterprises should be able to define the expiration policies just as retention categories are defined.

Resolving bottlenecks

VTL generally consists of a virtual tape appliance or server, and the software emulates traditional tape devices and formats

VTL can help meet the needs of a tiered storage environment. Data archival depends on access-based or date-stamped-based policies, and the volumes of data to be backed-up and the relative importance of it can be previewed through any commercial archival software.

Alternatively, adding storage resources such as disc arrays can also help handle burst requirements. Usually, good planning is the key, more so when it comes to data archival and security. Deploying a mid-size tape library of approximately 40 to 100 slots will be adequate for most burst scenarios.

Enterprises may also automate policy-based categorisation and archival of e-mail data. Defining appropriate management of content coupled with appropriate tiers of the storage sub-system will further address the increase in data volumes and assist them during data retrieval.

Comments V S Manikkam, AGM, Information Technology, Henkel Technologies, “We had a bad experience with tape libraries, hence we switched to USB hard discs. Post-migration, our data retrieval is faster and reliable. Tapes are unreliable, and the performance credibility is too low. Tape is also costlier and the cost involved is recurring. We have deployed automated processes through a written SQL script. The script is transmitted via messages. It indicates that the back-up process has been successfully executed.”

Vendor Speak
When data is being archived, enterprises should look beyond just plain back-up. The focus should be on continuous availability. The parameters should be defined more on objectives such as recovery point and recovery time objective.

The difficulty while archiving is still the back-up window. To solve this, enterprises are looking to solutions such as replication and continuous data protection.

Replication is implemented using a fail-over server. Back-ups of data from the critical server should be taken on the fail-over server. In case the critical server fails, the failover server should take over automatically.

Courtesy: Rajendra Dhavale, Consulting Director, CA

Shrinking the back-up window

Assuming that the back-up window entails downtime for applications, one of the easiest ways to reduce it is to take a point-in-time copy-based back-up. Using multiple tape drives to transport data from source discs at a faster pace will certainly help reduce the back-up window. If the back-up is performed on a SAN, moving data via the SAN as against moving it over the LAN will also contribute to shortening the back-up window.

Depending on the recovery point objective (RPO) and recovery time objective (RTO) requirements, a snapshot-assisted back-up, open file back-up or one using database APIs can be employed. One of the most important facts to ascertain is the consistency of backed-up data so as to assure recovery in any case.

For effective data protection, enterprises should look at a single framework and tool for managing both structured and unstructured data with an emphasis not just on data recovery but on total system recovery as well. This includes the operating system and various enterprise wide applications.

Informs Sunil Mehta, Senior Vice-president and Area Systems Director of JWT, “We have two audits every year. Back-up and restoration forms one of the most important parts of the audit process. The physical media has to be in workable condition since it is stored for years. In India, the tax regulators want most of the corporate finance-related data to be archived for eight years, hence I strongly advise enterprises across verticals to keep themselves updated on new and emerging technologies in this regard.”

Optimisation of storage resources required for back-ups can be achieved by backing-up transient data more frequently than non-transient data. Advises Anand Naik, Director, System Engineering, Symantec India and Saarc, “Organisations can also deploy a block-level incremental back-up mechanism which backs up only the changed data blocks instead of the entire file, helps reduce the back-up window, and also saves on space. Archival and retrieval solutions should be able to offer tight integration with an organisation’s back-up and data retention policies.”

VTL Encryption Standards
One of the encryption and decryption services is based on the Advanced Encryption Standard. The National Institute of Standards and Technology agency of the US government publishes it. It allows enterprises with VTL customers to encrypt virtual tapes and export them to physical tapes. This will guard and secure the data against unauthorised access and information theft in case of misplacement or loss during transportation.

Security: a necessary overhead

The overhead imposed by a security framework is due to the additional level of checking involved. By properly isolating applications, network and data that require stringent security measures from those that do not, one can reduce the overall overhead imposed on the infrastructure. In addition, extra resources such as CPU cycles and memory may be allocated to compensate for the overhead imposed by encryption / decryption routines.

All data owners such as those who have been given access rights by administrators will still have access to the data after it’s been archived. Archival software does not manipulate the access rights of the original data, it just manages its whereabouts.

In addition, data can be encrypted using archival software or an encrypting device on removable storage media containing the archived data. This needs to be transported around. It further fortifies the security measures if the media ever lands up in the wrong hands.

The Benefits Of Encrypting Archival Data
Business benefits Technical benefits
Add security to the data-sharing process without interrupting existing processes Encryption, key management, authentication and key sharing among business partners
Secures record retention for
regulatory compliance and
intellectual property management
Long-term key management and
the ability to accommodate technical changes while maintaining management and security

Encrypting archived data

Most available encryption technology used in archival allows enterprises to retain access to their data while providing full search and discovery capabilities.

Encryption during back-up is executed by means of an archiving appliance that has the encryption keys. This appliance does not retain the data that it encrypts; it merely stores the encryption keys and encrypts information before it is sent to secondary storage. The data is typically maintained in an encrypted form on the network.

Messages are decrypted when an authorised user conducts search and discovery using a Web-based user interface on the archiving appliance. The deployed solution and its functions vary since they depend on the vendor.

The archival system deployed at a data centre is a huge repository. Data stored here has to be preserved for a long time in a cost-effective manner to serve multiple needs starting with business needs and ending with regulatory compliance (think HIPAA and SOX).

— with inputs from Aishwarya Ramani

 
     
- <Back to Top>-  
Untitled Document
 
Indian Express - Business Publications Division

Copyright 2001: Indian Express Newspapers (Mumbai) Limited (Mumbai, India). All rights reserved throughout the world. This entire site is compiled in Mumbai by the Business Publications Division (BPD) of the Indian Express Newspapers (Mumbai) Limited. Site managed by BPD.