Archives ||About Us || Advertise || Feedback || Subscribe-
-
Issue of January 2005 
-

[an error occurred while processing this directive]

  -  
 
 Home > Vendor Voice
 Print Friendly Page ||  Email this story

Near-line storage

Optimizing storage using ILM

In this knowledge driven age data seems to be overflowing. It is important to classify information by date and store it so the right information can be retrieved at the right time and place. by Sajeevan P.K.

Effective management of information is the key to success for any organisation. Information should be available whenever and wherever it is required. For instance, when a payment is made using credit card, a lot of back-end processes such as fraud detection, credit checks, capturing of transactions, are triggered automatically to validate the transaction before it gets authorised. All the back-end processes need to commence and complete in a few seconds. An interruption to the back-end processes can prevent the customer form using his or her credit card.

While most transaction oriented business processes require high performance databases, certain other data types may not require such levels of performance. Normally in most organisations, there can be a mix of different types of data each requiring different availability, performance and protection levels. The table below explains this. It compares data belonging to rich media such as audio/video and database belonging to an online transaction-processing environment.

Classification of Data

Different data types require different levels of service, thus providing an option to classify data and to store it on storage media such as fibre channel disks, ATA disks, optical media or tape media. For example, rich media content may be stored on tape instead of expensive disk media.

Another way to classify data is by applying the "information life cycle management model" or ILM. ILM is a recent buzzword in the industry having a multitude of definitions. In simple terms, ILM helps organisations classify information based on age. Typically, the number of references made to specific information reduce as it gets older. And a large part of the information stored in an organisation comprises old, rarely accessed reference information to be preserved due to legal requirements.

ILM’s natural

We have, ironically, been instinctively following the ILM concept to store information in the past. For example, most recent information was stored on disk systems and old information was taken from disk and archived on tape media as off-line content. Whenever there is a need for reference on the archived information, an operator loads the tape and restores the information to the disk. This was practical when the data sizes were small with very few references to the archived information.

Changing with time

With the information explosion, automation of information management became very important. Information also has evolved into many distinct classes and the two established categories of classification, namely on-line and off-line became insufficient to obtain the required service levels and automation. There is now a new distinct class of information that falls in the midst of on-line and off-line information that is called near-online or near-line information. Near-line information is characterised as the data that requires fewer references and modifications compared to on-line information but better availability through automation of information retrieval as compared to storing it off-line.

Near-line storage solutions involves a disk subsystem, a tape subsystem and a software application that manages information though automatic policy based migration across tape and disk subsystems. The software application provides a consolidated view of storage combining capacities of disk and tape masking the complexity behind the application. The user on the network accessing the near-line storage gets the view of a large storage device that is on-line. Based on user-defined policies, the software application migrates data between disk and tape. For instance, a policy can be to "migrate data older than one week and not accessed during the last four days from disk to tape". After the migration of data, which is internal to the system and transparent to the user, the user will still be able to see an entry of the file in his directory listing.

Considerations

The following points need to be evaluated while considering a near-line storage system.

  • The system’s ability to observe an access pattern to the data based on parameters such the creation date of data. This is required to set policies that will help data to migrate automatically and maintain performance balance across frequently and rarely accessed data.
  • The minimum storage capacity requirement should be at least 5 TB with the need to scale to 50 - 100 TB and above to justify the cost. For smaller capacities, use either ATA based disk storage or FC based disk storage without bothering to classify data.
  • In near-line storage systems, 80 to 90% of data will reside on tape storage to make the solution cost effective. This means that there is every possibility that the required data resides on tape media especially in cases where data classification based on access patterns is difficult. Reading data from tape media is time consuming, as it requires picking up the appropriate tape media, loading it in the tape drive and getting the media ready to be read. Typical delay before the data can be read from tape is between 3 to 6 minutes, which is dependent on the tape technology. Hence, if there are applications interfacing with near-line storage systems, careful planning is needed to avoid application crashes due to time out etc.

Application Areas

Near-line storage solutions are ideal in the following areas.

  • Media archival requirements where huge amounts of analog video/audio are digitized and stored.
  • Storing patient information in hospitals such as X-rays and other scanned images.
  • Insurance companies, banks storing digitized documents.
  • Storing publications in digitized form by newspaper companies and other publishing houses.
  • Content designers and advertisement agencies generating high-resolution images using applications such as Adobe Photoshop.
  • Computer aided design and manufacturing organizations storing huge drawings.

The author is the Principle consultant, storage solutions, Datacraft India Ltd.

 
     
- <Back to Top>-  

Copyright 2003: Indian Express Group (Mumbai, India). All rights reserved throughout the world. This entire site is compiled in Mumbai by The Business Publications Division of the Indian Express Group of Newspapers. Site managed by BPD