Information Lifecycle Management
Why your organization needs ILM
Much has been written about ILM and many have dismissed it
as just another buzzword. But here's how you can implement ILM in your organization.
by Pramod S
As data in the organization grows, there will be a time when existing infrastructure
does not cost-justify the data it stores. Data will have to be classified based
on its value, and appropriate storage infrastructure will need to be selected
for it. The need is to identify data based on its value and walk it through
its life from one stage to another by matching its value to its associated infrastructure.
And this is what Information Lifecycle Management (ILM) can provide.
ILM is not a product or a process; it is actually a combination of both. It
is a combination of processes and technologies that determine how information
lives in an environment. Lifecycle means data creation, modification, retirement
and (sometimes) deletion, while matching changing data value to appropriate
storage targets. ILM is not some point solution which when implemented will
solve all the storage problems; it might not make sense for every environment.
Building different classes of storage for different needs is justified only
when the data available in the enterprise is significantly large. For example,
the incremental cost (cost of incremental space on the high performance storage,
the associated infrastructure, software and management costs), required to store
data, which is not critical, should be much higher than building a separate
infrastructure for it. That is when ILM is justified.
ILM should not be confused with HSM (Hierarchical storage management). Like
HSM, information life-cycle management uses a tiered approach. But HSM moves
data based on age alone while ILM determines the tier based on business value.
So the end objective of ILM is to:
- Improve Information Availability
- Reduce Infrastructure costs
- Meet Regulatory Compliance
Though ILM as a concept offers a lot of hope, the technology behind it is not
fully baked. There are plenty of individual products that focus on certain parts
of this newest trend for storage open systems and the industry is some time
away from delivering a fully integrated ILM product set. At present, the idea
is to treat ILM as a series of building blocks, which the enterprise can implement
and improve their existing storage environment right away while working toward
full and successful ILM as the technology matures.
Stages of implementation
The core of ILM is classification of data. The different stages of an ILM implementation
- Discovery of data
- Classification of data based on usage and value
- Infrastructure to support data of different value
- Match data value to protection options
- Match data values to retention options
- Match data values to usage options
- Automatic Transparent Data Movement Tools
- Global Name space
- Data Lifecycle policy
Discovery and classification of data: This provides details about the location
of files, the file owners and how they can best group and manage them. By grouping
files based on their value, data can be moved to the "right" infrastructure
and appropriate protection and retention policies applied to them.
Transparent data movement: ILM needs transparent, automated data movement that
happens without annoying users or interrupting applications.
Global name space: ILM also needs a global name space, which leads users to
believe that a specific file they're seeing is actually where the file is physically
residing. The abstraction layer lets administrators change the location of files
and directories without having to update the end user's access or notify them
of the new location.
Policy engine: Once the infrastructure is in place along data classified, the
lifecycle of data from its creation to its deletion can be automated and be
made policy driven.
The core of ILM is classification of data based on value, and an automated tool
to do this is one of the weakest links. Though there are multiple products in
the market, there still isn't a tool that caters to all the requirements. Tools
can classify data based on type of data, frequency of access of data, owner
of data and other parameters. The difficulty lies in associating value to a
particular type of data and instructing the policy engine on how to handle such
data. Data of a particular type or a particular usage pattern may not always
have the highest value.
At present, one way to do this is to have an interview process to define resource
classifications and leverage on the reports generated by resource discovery
tools during these interviews. Since organizations have different sets of data
that they regard as high value versus low value, in general, business requirements
determine data value. Because of this, business users need to be involved in
determining data value. This requires interviews across departments in order
to align data valuation across the business.
Transparent movement of data
The second part, where there is still lot of work being done, is transparent
movement of data. Though there are solutions in the market for unstructured
data (flat files, images etc) and structured data (like databases) there is
no single solution, that can manage data movement completely across the enterprise.
Also the solutions available are mostly still point solutions, that are also
limited in terms of multi-platform support.
Hence the best way to get the most of this new trend is to
treat ILM as a series of building blocks. There are solutions available for
many applications like ERP, Messaging and standard flat files too.
To make the most of what is available today one must:
- Classify data based on its value using tools if applicable, or classify
- Work application by application.
- Build infrastructure to match data of different values.
- Automate transparent movement of data from one stage to another where possible.
- Build the right processes for manual movement of data where necessary.
- The approach has to be tactical now. It can be strategic later.
Some of the drivers for ILM are:
Regulatory Compliance: Government regulations will drive the need for ILM in
a big way in every organization. The need to preserve data for long periods
and the ability to be able to retrieve that data will be the key.
Data warehousing and Data Mining: Data is the most critical asset of an organization.
Technology has helped organizations use this data in various ways to improve
the way business is done, and helps an organization understand where it stands.
The same data will be required in different forms for different reasons, and
for keeping multiple copies of the same data on the same infrastructure; all
other data will not provide economic justification in many cases.
Online copies for Instant recovery: Infrastructure cannot be unavailable even
for a very short time, especially in an organization like a bank. Maintaining
online copies is one of the simplest and the fastest ways to recovery from a
major failure. Maintaining similar data copies on infrastructure similar to
the actual data again does not provide economic justification.
Test/Development setup: Every production environment will have a test/development
setup, which actually works on the same data as the production setup, but uses
an offline copy. This setup is a must, as any changes that need to be made on
the production copy, first has to be tested on the test setup.
Reference Data: Most organizations have static data, or data which can be generated
on loss, but which requires faster access than what is available using tapes
ILM manages and streamlines complete data life cycles by matching the data's
point-in-time value to prioritized resources. It improves information availability
and reduces infrastructure and associated management costs. The key is to identify
where to apply this, when to apply this and how to apply this to achieve the
maximum benefit. It has to be built piece by piece to complete the final jigsaw
The writer is senior architect-Storage Solutions, Apara