|
Near-line storage
Optimizing storage using ILM
In this knowledge driven age data seems to be overflowing.
It is important to classify information by date and store it so the right information
can be retrieved at the right time and place. by Sajeevan P.K.
Effective management of information is the key to success for any organisation.
Information should be available whenever and wherever it is required. For instance,
when a payment is made using credit card, a lot of back-end processes such as
fraud detection, credit checks, capturing of transactions, are triggered automatically
to validate the transaction before it gets authorised. All the back-end processes
need to commence and complete in a few seconds. An interruption to the back-end
processes can prevent the customer form using his or her credit card.
While most transaction oriented business processes require high performance
databases, certain other data types may not require such levels of performance.
Normally in most organisations, there can be a mix of different types of data
each requiring different availability, performance and protection levels. The
table below explains this. It compares data belonging to rich media such as
audio/video and database belonging to an online transaction-processing environment.
Classification of Data
Different data types require different levels of service, thus providing an
option to classify data and to store it on storage media such as fibre channel
disks, ATA disks, optical media or tape media. For example, rich media content
may be stored on tape instead of expensive disk media.
Another way to classify data is by applying the "information life cycle
management model" or ILM. ILM is a recent buzzword in the industry having
a multitude of definitions. In simple terms, ILM helps organisations classify
information based on age. Typically, the number of references made to specific
information reduce as it gets older. And a large part of the information stored
in an organisation comprises old, rarely accessed reference information to be
preserved due to legal requirements.
ILMs natural
We have, ironically, been instinctively following the ILM
concept to store information in the past. For example, most recent information
was stored on disk systems and old information was taken from disk and archived
on tape media as off-line content. Whenever there is a need for reference on
the archived information, an operator loads the tape and restores the information
to the disk. This was practical when the data sizes were small with very few
references to the archived information.
Changing with time
With the information explosion, automation of information management became
very important. Information also has evolved into many distinct classes and
the two established categories of classification, namely on-line and off-line
became insufficient to obtain the required service levels and automation. There
is now a new distinct class of information that falls in the midst of on-line
and off-line information that is called near-online or near-line information.
Near-line information is characterised as the data that requires fewer references
and modifications compared to on-line information but better availability through
automation of information retrieval as compared to storing it off-line.
Near-line storage solutions involves a disk subsystem, a tape subsystem and
a software application that manages information though automatic policy based
migration across tape and disk subsystems. The software application provides
a consolidated view of storage combining capacities of disk and tape masking
the complexity behind the application. The user on the network accessing the
near-line storage gets the view of a large storage device that is on-line. Based
on user-defined policies, the software application migrates data between disk
and tape. For instance, a policy can be to "migrate data older than one
week and not accessed during the last four days from disk to tape". After
the migration of data, which is internal to the system and transparent to the
user, the user will still be able to see an entry of the file in his directory
listing.
Considerations
The following points need to be evaluated while considering a near-line storage
system.
- The systems ability to observe an access pattern
to the data based on parameters such the creation date of data. This is required
to set policies that will help data to migrate automatically and maintain
performance balance across frequently and rarely accessed data.
- The minimum storage capacity requirement should
be at least 5 TB with the need to scale to 50 - 100 TB and above to justify
the cost. For smaller capacities, use either ATA based disk storage or FC
based disk storage without bothering to classify data.
- In near-line storage systems, 80
to 90% of data will reside on tape storage to make the solution cost effective.
This means that there is every possibility that the required data resides
on tape media especially in cases where data classification based on access
patterns is difficult. Reading data from tape media is time consuming, as
it requires picking up the appropriate tape media, loading it in the tape
drive and getting the media ready to be read. Typical delay before the data
can be read from tape is between 3 to 6 minutes, which is dependent on the
tape technology. Hence, if there are applications interfacing with near-line
storage systems, careful planning is needed to avoid application crashes due
to time out etc.
Application Areas
Near-line storage solutions are ideal in the following areas.
- Media archival requirements where huge amounts of
analog video/audio are digitized and stored.
- Storing patient information in hospitals such as
X-rays and other scanned images.
- Insurance companies, banks storing digitized documents.
- Storing publications in digitized form by newspaper
companies and other publishing houses.
- Content designers and advertisement agencies generating
high-resolution images using applications such as Adobe Photoshop.
- Computer aided design and manufacturing organizations
storing huge drawings.
The author is the Principle consultant, storage solutions, Datacraft India Ltd.
|