Taking the first steps
Translating raw data into usable information is not as easy
as it seems. As anybody who has implemented Business Intelligence software will
tell you, the first steps in any BI deployment are the most important. For it
is these steps that will determine whether your BI solution delivers or not.
by Anil Patrick R
most organisations, information means little more than data about the business
that has been captured by transactional (OLTP in most cases) and legacy systems
such as COBOL programs. This is the most basic element that business managers
and other employees will want to analyse and use for their decision making.
For example, a manufacturing company will have an ERP and/or other transaction
systems that continuously capture data (mass volumes, partners, sales, customers,
and so on). A manager might need to extract reports from these systems for tracking
performance over the past five years, say, of a single gauge the firm makes.
These kinds of analyses are crucial and frequent in most companies in helping
to decide whether to invest more in a particular product or diversify. Now,
the problem with getting this information is multi-fold in nature, leading to
the need for a centralised system such as the data warehouse and data mining
technologies that form the basis of business intelligence (BI) systems.
WHERE ERP COMES UP SHORT
Business Development Director India/SAARC
The first problem with traditional systems is that transactional
systems such as an ERP or homegrown ones are inherently designed more for the
capture of data than for data retrieval in the form of usable reports or graphs.
If you ask an ERP system to churn out a report or a query, it will probably
take a few minutes or even hours to answer the question. This is because an
ERP systems design is not optimised to answer a question, says Sanjay
Deshmukh, Business Development Director, India/ SAARC, Business Objects.
Second, these systems do not store data for more than a couple
of years (in most cases). Next is the fact that information is spread across
multiple systems (ERP, SCM, CRM, databases, Excel spreadsheets, legacy systems,
and competitive business information) and it is impossible to get the big picture
from just the information in a single system or as multiple distributed reports.
The problem with traditional systems
is that transactional systems such as an ERP or homegrown ones are inherently
designed more for the capture of data than for data retrieval in the form
of usable reports or graphs
Budgeting, planning, and so on are not part of ERP. Often
you find this information in Excel files. There are islands of information in
such forms in addition to the ERP system. The biggest challenge for anybody
is getting data from these islands and consolidating that information, which
is where BI provides the best bang for the buck, says Deepak Ramanathan,
Solutions Architect, Business Intelligence, SAS India. These are the reasons
why business executives lead when it comes to taking the call to implement a
- Are current systems adequate to handle
increased reporting requirements?
- Is the volume of data you store on your
OLTP system raising your costs significantly?
- Is your storage architecture capable of
adapting to new data loads and increased access demands?
- How difficult is it to maintain existing
- Does the process of creating reports
affect the production cycle?
- Are increased user demands affecting your
performance measures and associated costs?
- Do you currently limit the data on your
- Are you building more summary tables to
satisfy user performance requirements?
- Do users need to access multiple databases
to get answers to business questions?
- Do you anticipate rapid growth in the
amount of data that you will be keeping online for regulatory performance?
- Are your data retention periods increasing?
- How often is the data on your primary
OLTP systems required for reporting and modification?
- Do you have established service levels
for all applications and data types?
- Are data types segregated by access requirements
and storage options?
- Have you automated as many manual and
repetitive tasks as possible?
- Do you have a plan to rationalise and
- Can you reduce or eliminate customisation
requirements from applications?
- Are costs a consideration when evaluating
- Does your IT team have the right skill
sets to manage the various applications and systems?
- Do you use standard interfaces and common
- Does your architecture share commonality
across the enterprise?
- Do you have standardised processes for
application controls and data management?
- Can you break down monolithic structures?
- Do you employ reusable components?
- Are logical architectures implemented across your enterprise?
Business goal check-in
- Do IT goals map to current business imperatives?
- Does your IT team offer innovative and effective solutions to business
- Are your IT outreach efforts proactive?
DRILLING DOWN TO THE BRASS TACKS
To take a common case, Mumbai is
entered as Mumbai 21 despite a separate box being provided for the pin
code. Data quality is challenging in India, unlike in Western countries,
where addresses are standardised. Here it is common to find addresses
like behind Lakshminarayan temple, Under the flyover
and things like that
Before going into the actual steps of how to harness organisational data, it
is important to understand that it takes more than a CIO and his team to get
BI to work the way it is supposed to. To start with, business intelligence implementations
are typically not proposed by IT departments, but by the business function or
This is why the first step needs active ownership and involvement of the business.
BI is not an application which can be driven by the individual manager
of a department. There needs to be senior management buy-in and support, because
BI as an application will take care of the information needs of every single
user, whether it is a senior executive, middle management user or an operational
user, explains Deshmukh.
Once this is guaranteed, it is time to conduct BI readiness checks. Different
organisations and vendors follow different approaches for assessing BI preparedness.
These include user requirement studies or even GAP analysis. The most common
objective behind all these studies is to find the driving factors for BI and
the technical preparedness in the firm.
According to Zoeb Adenwala, Chief, IT, Pidilite Industries there are a few things
to be kept in mind before an organisation deploys a BI solution such as data
integrity, data warehouse and compatibility, integration with back-end transaction
systems along with the suitable BI software to meet the requirements. "But
the most important aspect for a successful implementation of BI is the organisational
readiness along with the commitment from the management," he further adds.
|The rollout of BI is a huge task
in itself. Keeping some of these aspects in mind will be a starting point
in streamlining design, implementation, and rollout.
- BI is of no use if it is not used across the
organisation (across top management, middle management and operational
users). Restricting access to just a select few defeats the purpose
of such a huge investment.
- Large organisations need to have a unified BI
vision. This can help avoid the situation of many large organisations
faced with challenges such as distributed BI silos from multiple vendors.
Having a dedicated BI team will go a long way in getting this single
- Initially, BI is leveraged mostly by analytical
business users. It is always beneficial to identify and get such users
actively involved in the initial stages of BI to gain wider user acceptance
of the project.
- Information needs of top management, middle
management and operational users are different. User studies will define
the extent of information access that each of these users will require.
- Business rules may have to be evaluated and
refined on a continuous basis to ensure that the raw data remains clean.
Legacy data will need setting of global business rules as well as manual
- It is not always possible to clean data completely,
especially in legacy systems. A call might have to be taken at times
not to include certain data that is beyond repair, to avoid erroneous
- If you choose an RDBMS-based solution, its best
to go in for the same RDBMS as the one that the organisation uses to
leverage existing skill sets.
- Training as well as user feedback will be required
during the initial days of the project.
CO-OPTING BUSINESS USERS
First, it is necessary to find out if the organisation really needs to have
a BI system in place. This is where inputs from users are of paramount importance.
The results of this study will help determine if reporting capabilities of existing
systems can be tweaked to meet requirements.
Once you have determined that BI is the answer, this study will help identify
the key driving factors for the required BI implementation. Time has to be spent
with the business users at this stage to ensure that requirements such as the
need, kind of analysis, and so on are understood. Otherwise, it is very difficult
to get the required results from a BI initiative.
The technology side of BI preparedness checks on whether the basic infrastructure
is in place. This will first consider the different transaction systems and
platforms that the organisation has. This stage will also evaluate your systems
for parameters such as data management and availability. One of the main things
checked during this stage is the reporting performed on existing systems. Next
is to determine whether these systems have sufficient headroom to handle increased
Once these have been achieved, it is time to define the scope of the work document
that is required for the implementation.
SCRUBBING DATA CLEAN
Organisations need to define business rules to ensure
data validity. These rules have to be defined by business users. This
will ensure data accuracy for the present and the future. This is why
top management has to take ownership of BI projects
Before even contemplating BI, it is essential to get a proper status check
done on the status of organisational data. This is because having data spread
across multiple sources is not the same as having accurate, analysable data.
The first stage of any BI implementation called extract, transform, load (ETL)
or data integration (DI), depending on the vendor, calls for the use of accurate
and standardised data without which reports generated are highly inaccurate.
At this stage, the biggest challenge for an organisation is the availability
of data, followed by the transformation or cleaning up of data. While extraction
is pretty much standard across products (for example using means such as ODBC
and JDBC), ETL as a process is the most important one in a BI application.
It is very important to understand in this context, that data sources
are not cleaned, it is in reality the data from sources that is cleaned before
being put in the data warehouseas cleaning or modification of data sources
would break the applications, says Vaibhav Phadnis, Director, Server Business
Group, Microsoft India.
ETL is the most underestimated area in a BI application in terms of effort
required, cost, and importance. If this stage goes wrong, the quality of the
data that goes into the system will be poor, leading to wrong decisions by users.
In terms of effort required, ETL will cover 40 to 50 percent of the efforts
required for the entire BI initiative, says Deshmukh.
Tools used in the transformation element vary. Some data validation and
data accuracy checking can be accomplished with straightforward Transact-SQL
code, adds Phadnis.
Cleaning up data is not a simple task. To take a common case, Mumbai is
entered as Mumbai 21 despite a separate box being provided for the pin code.
Data quality is challenging in India unlike in Western countries where addresses
are standardised. Here it is common to find addresses like behind Lakshminarayan
temple, under the flyover and things like that, which are
the most challenging, explains Ramanathan.
Data quality is of primary importance even in ERP systems where wrong naming
can occur. For example, the system might designate a valve as part number 35,
and elsewhere the same part may be represented as valve 30. If a report has
to be done on the demand for valve 30, the analysis might miss the wrongly named
part. This is a mismatch from the reporting perspective.
If the data quality itself is not addressed during ETL, reporting down the line
will be ambiguous. It will not be up to date on the required objectives. This
is why it is crucial to clean up your data sources, and set policies and processes
in place to ensure that it remains clean.
A data quality audit in terms of looking at how much data can be retrieved is
a good way to begin. This should look at the state of existing data, and then
the data quality. A report on the data quality should then be presented to the
management, business heads or the IT team to inform them of issues with data
and what will have to be done to resolve them. Many of the issues might also
require process changes.
In addition to these, there are also many validation tools available in ETL
to clean up data. When data is being extracted, the workflow of how data is
going to be extracted also has to be defined. No technology on its own
can ensure the quality of existing data. Once the data model is defined, we
define data accuracy rules for every element in the model. Then the changes
are effected whenever something is detected, says Arun Ramachandran, Presales
Head, India and SAARC, Sybase.
The organisation needs to define business rules to ensure data validity. These
rules have to be defined by business users. Only this will ensure data accuracy
for the present and future. This is why top management has to take ownership
of a BI project. Data quality is a cyclical task. You cant do it
once and forget about it. But the ownership of data quality resides with the
company. They have to clean the data, polish it and, more importantly, make
sure that data is captured correctly down the line. Sanctity of the data is
a corporate ownership, says Ramanathan.
Global business rules will have to be set for legacy data. A cut-off date for
cleaning up the data will also have to be defined to ensure timely completion
of this cleanup. A call may also have to be taken at times to cut off irreparably
inaccurate data from the BI system to avoid faulty reporting.
Once the ETL phase is sorted out, it is time to create the central repository
or the data warehouse. These are usually RDBMS solutions from vendors such as
Oracle, Sybase, IBM and Microsoft or specialised RDBMS such as those from SAS
and NCR Teradata hosted on a physically separate server.
On the technical side, data warehouses are typically built on an enterprise
framework, but with a differencebeing made up of small data marts. Usage
of these data marts ensures that the data warehouse can be scaled up easily
The choice of the data warehouse is based on different factors including data
volumes, performance, future scalability potential, and available skill sets.
Many organisations prefer to have their data warehouses on RDBMS platforms that
they already use to leverage existing skill sets. It is also very important
to define the objectives required from the data warehouse. For example, if the
goal is to optimise delivery and analysis, the data warehouse has to be built
keeping those objectives foremost.
Data mining is then conducted to exploit data in the warehouse. Different tools
available in the market have different capabilities and strengths when it comes
to reporting, as we shall examine soon.
When it comes to evaluating the
reporting features of a BI solution, it is important to remember that
the system is meant for use across the organisation. This means that it
has to cater not only to senior executives, but also the middle management
and operational users.
When it comes to evaluating the reporting features of a BI solution, it is
important to remember that the system is meant for use across the organisation.
This means that it has to cater to not just senior executives but also the middle
management as well as operational users.
The information requirements of these users vary considerably. For operational
users, the information needs are going to be very basic in nature. All they
need is data in the form of reports for particular functions. This data can
come from the warehouse or any transactional system.
Next is the middle management user who is typically an analyst. These users
look for features such as OLAP, or drill and dice capabilities to analyse data
As opposed to this, the senior management will be looking for quick data snapshots.
These users are interested in keeping an eye on key performance indicators (KPI),
dashboards and balanced scorecards.
Says Phadnis, The value of BI is in its pervasive usage. Most organisations
assume that BI is expensive and limit its usage to top executives or selected
business roles. To derive maximum returns from a BI investment, it is essential
that every employee in the organisation should have access to the intelligence
generated by the Business Intelligence solution deployed by the company.
All these varying requirements have to be identified and translated to achievable
formats before charting out reporting. It will also have to be kept in mind
that it will take a fair amount of work, rework and training (during the initial
phase) before the users are satisfied and empowered to make the right decisions
using the BI solution.
with inputs from Shivani Shinde