|
Home
> Cover Story> Full Story
The
database evolution continues
S. B. Navathe
|
P
R
O
F
I
L
E
|
 |
S.
B. Navathe is Professor, College Of Computing, Georgia
Institute Of Technology. His present research interests
include database modeling, design and integration in
the context of emerging applications-engineering design,
biological (particularly human genome) databases, document
and text databases, and collaborative applications.
He has authored over 90 books, mostly related to databases.
|
The
changes to database technology in 2002 will be evolutionary
and not revolutionary. While it is unreasonable to expect
any quantum leaps in the form of data handling functions of
popular DBMS, Data Warehousing and OLAP products in the span
of a year, we can expect incremental improvements in other
areas
Your expectations about the future outcome of database research
may be high. Since you can't take a trip into the future to
find out, the logical thing would be to ask a person who has
been dabbling with databases for many years. I for one have
been associated with database research for over thirty years,
and my answer to your question is, "Some incremental
versions like going from 8.0.1 to 8.0.2, but nothing drastically
new!" All these years I've been observing how database
vendors and products react to changing demands. The reaction
has always been one of an evolutionary nature rather than
a revolutionary one.
State-of-the-art database technology over the last several
years has not changed drastically and hence it is unlikely
to make any giant strides in the coming year. Relational data
management and allied data warehousing, ROLAP, and other products
probably account for over 90 percent of revenues in the current
database market a mix that will not change significantly in
a year. This technology is firmly planted due to its extremely
robust research foundation in terms of the work that happened
at IBM Research, several universities including the University
of California and Computer Corporation of America in the 70's
and early 80's.
The rest of the market is taken up by products related to
multidimensional data management (MOLAP), object-oriented
database systems (like Objectstore, Versant, Jasmine, Poet,
and O2), and a whole variety of tools related to data extraction,
transformation, integration and utility software. Interestingly,
note that where new developments in database products are
primarily geared to managing relational and so-called object
relational data, there is a dominant presence of legacy data
everywhere in the western world in hierarchical databases
(IMS of IBM), network databases (predominantly, IDMS, IMAGE
and DMS 1100) and ISAM, VSAM or just plain sequential files.
The challenge to database products has always been to be able
to draw upon the existing valuable information in legacy systems
while forging ahead with new concepts of data management.
Furthermore, data is growing by leaps and bounds on the Web,
which currently boasts of 400 million plus users, who are
creating their own unrestricted and unstructured content,
which conforms to no standards other than HTML (which is not
a data model at all). It is estimated that the amount of data
residing in databases that are under the control of legacy
database management systems and other non-standard forms accounts
for over 50 percent of overall database content worldwide.
Before we look at what is in store for 2002, let us summarize
the current state of affairs. The overall data management
problem is aggravated by:
Avalanche of data: Digitized information is increasing at
a phenomenal rate. With digital still and movie cameras, average
users will generate hundreds of gigabytes of personal data
annually, which is likely to clog networks, regardless of
the tens of Gbps speeds. It is conceivable to see digital
archives such as all movies ever produced. The 100,000 or
so movies would fit into a petabyte of storage at about 10
GB per movie. We can expect a very rapid digitization of all
forms of multimedia data that will be stored on servers for
mass consumption.
Variety of data: Early databases contained only alphanumeric
data. Today, databases contain digitized information coming
from variety of sources: sensors, audio, video, and still
images. Depending on the nature of this information, the type
of processing or operation changes. Informix pioneered the
concept of data blades to handle data types (e.g. time series,
text, and images); Oracle has followed with cartridges. This
trend of adding data types with special functionality is going
to continue.
Variety of users: Just as variety of data affects the kind
of operations that can be meaningfully performed on it, end-users
dictate the way they would like data manipulated or presented
to them for meaningful processing. More ordinary, naive, and
computer illiterate users (including young kindergarten children)
or housewives, are becoming database users and their number
keeps growing everyday. While querying and transaction processing
were the two main functions of database systems, the modality
has changed to browsing, searching and exploration of data
for an average user.
Expansion along the above three dimensions is further aggravated
by the inherent complexity of data and of applications. While
database technology mainly dealt with applications in banks,
insurance companies, manufacturing and government record-keeping
units in the first couple of decades of existence (1965-85),
the e-commerce applications have made a lot more data available
to a large variety of transactions involving consumers and
businesses. Many databases such as product catalogs, travel
and entertainment schedules, informative reports, facility
descriptions, rules and policy manuals etc are at the disposal
of the general public. Opportunities in the entertainment
industry, education, healthcare, telemedicine, biotechnology
coupled with the growing connectivity afforded by broadband
and wireless technologies has opened up an unlimited span
of possible applications.
Now let's take a look at database technology trends we can
expect in 2002.
Technology
Trends
While it is unreasonable to expect any quantum leaps in the
form of data handling functions of popular DBMS, Data Warehousing
and OLAP products in the span of a year, we can expect incremental
improvements in the following areas. All these areas are being
actively researched by major database research groups worldwide.
Scalability of Mobile Applications: Currently, mobile versions
of RDBMs like Oracle or SQL Server allow mobile users with
PDAs, pagers and laptops or notebooks to retrieve data from
databases at control sites. There are synchronization tools
such as imobile Suite (from Synchrologic), that enable synchronization
of client databases with servers when the clients can operate
in a disconnected fashion. This technology needs to be scaled
substantially, by using groupings of data at the server, clustering
of clients into client clusters and possibly doing periodic
broadcast or multicast of this information. We have an active
research project in this area for the past four years. The
results of such research must find their way into products
in the coming years.
Middleware Solutions: Existing middleware technologies like
CORBA, SOAP, J2EE, Bluetooth, etc. have their limitations
in terms of supporting truly distributed applications involving
device, data and platform heterogeneity. There are a large
number of university research projects in this area. Some
of that research is bound to have an impact on commercial
middleware standards that will be more generic than what Microsoft
and Sun will propose to suit their own XML/C# and Java-based
solutions. We are doing a major collaborative research project
called SyD (System of Devices) at Georgia Tech and other Georgia
State Universities in this area.
Security and Directory Services: The security provisions,
especially in wireless protocols such as WAP, leave a lot
to be desired. The weakness of the WEP encryption technology
used by both 802.11b and Airport of Apple seem to indicate
that the Bluetooth protocol has an advantage over them in
security. Similarly, directory standards like LDAP lack flexibility
and scalability. Development of directory services with backend
database support will result in a better overall infrastructure
for mobile, wireless applications. We are presently developing
a prototype of a flexible directory service called "Communities
of Interest Directories" in a research project.
User Interfaces: User interfaces have been given a backseat
in database technological developments. With the iPAQ, PALM
or RIM type devices, use of (the rather limited) screen real
estate, and design of icons and functions becomes extremely
important. We can expect many intelligent designs that will
mix visual displays with input modes based on drag-and-drop,
point-and-click or speech input/output. Use of multiple interaction
modes will be important for handicapped users.
All of the above areas have a high potential in terms of technology
transfer from university research to actual products.
Industry
trends
In the new year, as in the past, the industry is likely to
look up to University R&D projects in the US, Canada,
Europe, and Japan, to provide the next generation technology
solutions. But a number of other topics stand out where industry
is likely to take a lead. These improved solutions are the
fruits of developmental efforts and improved standards ratified
by various consortia.
Integration and Interoperability: This will probably be the
key theme for the next three to five years in the IT products
industry. Competing efforts by the two giants, Microsoft and
Sun, and their proprietary developments like MSIL (Microsoft
Intermediate Language) or NET.XML or ADO.NET for Microsoft,
and Sun's JCA (Java Connector Architecture), JDBC (Java Database
Connection), JNDI (Java Naming and Directory Service), JMS
(Java Message Service), will all be undergoing rapid revisions.
The J2EE platform has an edge in terms of multi-vendor large-scale
community support; also, Java is becoming the predominant
language of choice for IT students. For Microsoft-oriented
organizations .NET will be more appealing and will provide
the flexibility of letting them use whatever language they
prefer.
XML Developments: While Sun is putting its entire horsepower
behind Java, Microsoft is promoting XML as a panacea to all
problems. 'Web Services' is a new industry buzzword that refers
to an XML representation of objects, programs and messages
available over the Internet for application-to-application
communication. Web services technology will get lot of attention
in the coming year due to its promise of allowing a data-independent
means for coupling disparate systems toward supporting e-services
for better productivity. However, inspite of the domain or
application specific XML developments (e.g. FpML-Financial
Products Markup Language, ebXML-electronic business XML, PMML-Predictive
Model Markup Language in data mining), XML is plagued with
some basic difficulties. There are no efficient storage, indexing
and compression mechanisms yet available, and the nature of
the data model and its navigational languages like XQUERY
are reminiscent of the hierarchical data model (of IMS) which
was comparatively much simpler, efficient, and easy to navigate.
Native XML systems like TAMINO (from Software AG) are yet
to make an impact in the market. Although Microsoft wants
to convince the world that XML is going to replace everything
as a standard for data modeling, querying, and interoperability,
the fact is it has a long way to go. The excessive complexity
and proliferation of concepts and namespaces associated with
XML as defined by W3C (Worldwide Web Consortium) does not
make it ready for adoption.
Embedded Functionality: Database technology is a late entrant
in the embedded systems arena. We can expect small footprint
DBMSs, a thinner variant of SQL Anywhere type systems, to
become available on PDAs and similar devices. In order to
support dynamic distributed applications where both data and
code may reside at remote servers and be executed as needed,
it is necessary to embed some primitive database functionality
in these handheld devices. A similar capability is expected
on terms of creating designs with embedded input sensors that
monitor activities such as vehicle traffic or the flow of
goods on a conveyor and collect the parametric data in databases.
Sensors and activators will be already coupled as input/output
devices in future database products. The entire field of "active
databases" has produced a large body of research in the
last two decades in terms of rules processing. The current
implementation of rules in database products is too limited
just in the form of triggers. The associated management of
constraints is handled through stored procedures. This concept
needs to be generalized to handle many complex activities
with rules from each domain. A major development is inevitable
to support these future monitoring and control operations
in intensive care units of hospitals, in agricultural systems,
security and fire management applications or in nuclear power
plants.
Extensibility/Added Functions to Database Languages: Current
developments like SQL3 have added object-oriented functions,
including inheritance, to database systems. Their usefulness
for the application world is not yet confirmed and the language
has become unwieldy. At the same time, more and more applications
are demanding temporal, spatial and data mining capabilities.
Temporal processing refers to incorporating time as an inherent
dimension of data in order to process histories or time series
type data (e.g. stock market fluctuations, patient medical
history, aircraft maintenance history). The spatial dimension
relates to attaching spatial coordinates to data for providing
location-based services. Data Mining is an activity of discovering
new patterns and relationships that are not apparent from
just trial and error querying of the data. All these areas
have been topics of extensive research for almost 15-20 years,
yet industry has been slow in reacting. A temporal SQL (TSQL)
task force has recommended the necessary temporal functionality
into SQL; but more enhancements will probably have to come
from industry as commonly agreed features for query languages.
Service Applications: The Database area has remained largely
transaction-oriented with focus on high throughput transaction
performance for services such as banks, insurance companies,
airlines, retail industry and manufacturing. But the emergence
of the Web and e-commerce applications, ERP systems and supply-chain
management systems has opened up a vast array of possible
applications. One major area that will see growth is location
based services. They range from giving information about hospitals
or restaurants in the vicinity based on one's location, to
actually helping in emergency fire, traffic accidents, ambulance
type services to automatic navigational devices for the blind
etc. The GPS sensors will be installed in mobile devices like
cell-phones and will open up a vast array of possibilities
by combining the wireless connectivity, location information
and geographically organized information on the Internet.
The variety of domain-specific information brokering services
will continue to rise by providing valuable information at
some price to the consumer for selective shopping of products
and services. This will include financial or insurance products
and services as well as a variety of payment and information
consolidation services such as generating consolidated statements
for all accounts, all mutual funds owned by an individual
etc. This will not require any breakthrough in technology,
but just clever packaging, integration and display of information.
Long-term
issues
In the above paragraphs we described what solutions the industry
expects from universities and R&D labs. We also discussed
the potential areas where the industry can take a lead in
bringing out new functionality for new applications. We would
like to end by pointing out some long-term issues that need
to be addressed for improving the quality of applications
and for improving the user experience. Although products and
techniques abound, there are a large number of open problems
in the research domain for now.
a.
Web personalization: With the explosion of websites, product
catalogs etc there is a need for developing better techniques
to give users a truly personalized experience. Previous work
here includes collaborative filtering that keep databases
of user preferences and make recommendations. More work is
needed on targeting the right dynamic content to the user
and making the experience much more worthwhile, both for the
user and the product vendor.
b.
Search engines: The state-of-the-art search engines leaves
a lot to be desired when a simple input word returns 100,000
plus results for a typical query. Work from machine learning
and clustering in information retrieval systems has to be
applied to categorize the results or rank them more meaningfully.
c.
Affordable OLAP, browsing and mining tools: The current
products are prohibitive in cost. More affordable tools are
being designed by smaller companies with core functionality
for browsing, cubing etc to give users a much better sense
of the aggregate data and the trends.
d.
Easier system management tools: Eventually, as ordinary
people become the major creators of data content and also
its consumer, we need them to be able to deal with multiple
systems and platforms without the reliance on system administration
staff. Intelligent tool support is very necessary in this
area.
e.
Data quality metrics and tools: This is a big void in
the database area as opposed to any other type of products.
Very little attention has been given to the measurement and
systematic improvement of quality. We hope that industry and
organizations will take this up seriously in the years ahead.
Otherwise, no fancy technological situations can really rescue
an organization from accumulated bad data.
|