Archives ||  About Us ||  Advertise ||  Feedback ||  Subscribe-
 About Us

Home > Cover Story> Full Story

The database evolution continues
S. B. Navathe


S. B. Navathe is Professor, College Of Computing, Georgia Institute Of Technology. His present research interests include database modeling, design and integration in the context of emerging applications-engineering design, biological (particularly human genome) databases, document and text databases, and collaborative applications. He has authored over 90 books, mostly related to databases.

The changes to database technology in 2002 will be evolutionary and not revolutionary. While it is unreasonable to expect any quantum leaps in the form of data handling functions of popular DBMS, Data Warehousing and OLAP products in the span of a year, we can expect incremental improvements in other areas

Your expectations about the future outcome of database research may be high. Since you can't take a trip into the future to find out, the logical thing would be to ask a person who has been dabbling with databases for many years. I for one have been associated with database research for over thirty years, and my answer to your question is, "Some incremental versions like going from 8.0.1 to 8.0.2, but nothing drastically new!" All these years I've been observing how database vendors and products react to changing demands. The reaction has always been one of an evolutionary nature rather than a revolutionary one.

State-of-the-art database technology over the last several years has not changed drastically and hence it is unlikely to make any giant strides in the coming year. Relational data management and allied data warehousing, ROLAP, and other products probably account for over 90 percent of revenues in the current database market a mix that will not change significantly in a year. This technology is firmly planted due to its extremely robust research foundation in terms of the work that happened at IBM Research, several universities including the University of California and Computer Corporation of America in the 70's and early 80's.

The rest of the market is taken up by products related to multidimensional data management (MOLAP), object-oriented database systems (like Objectstore, Versant, Jasmine, Poet, and O2), and a whole variety of tools related to data extraction, transformation, integration and utility software. Interestingly, note that where new developments in database products are primarily geared to managing relational and so-called object relational data, there is a dominant presence of legacy data everywhere in the western world in hierarchical databases (IMS of IBM), network databases (predominantly, IDMS, IMAGE and DMS 1100) and ISAM, VSAM or just plain sequential files. The challenge to database products has always been to be able to draw upon the existing valuable information in legacy systems while forging ahead with new concepts of data management.

Furthermore, data is growing by leaps and bounds on the Web, which currently boasts of 400 million plus users, who are creating their own unrestricted and unstructured content, which conforms to no standards other than HTML (which is not a data model at all). It is estimated that the amount of data residing in databases that are under the control of legacy database management systems and other non-standard forms accounts for over 50 percent of overall database content worldwide.

Before we look at what is in store for 2002, let us summarize the current state of affairs. The overall data management problem is aggravated by:

Avalanche of data: Digitized information is increasing at a phenomenal rate. With digital still and movie cameras, average users will generate hundreds of gigabytes of personal data annually, which is likely to clog networks, regardless of the tens of Gbps speeds. It is conceivable to see digital archives such as all movies ever produced. The 100,000 or so movies would fit into a petabyte of storage at about 10 GB per movie. We can expect a very rapid digitization of all forms of multimedia data that will be stored on servers for mass consumption.

Variety of data: Early databases contained only alphanumeric data. Today, databases contain digitized information coming from variety of sources: sensors, audio, video, and still images. Depending on the nature of this information, the type of processing or operation changes. Informix pioneered the concept of data blades to handle data types (e.g. time series, text, and images); Oracle has followed with cartridges. This trend of adding data types with special functionality is going to continue.

Variety of users: Just as variety of data affects the kind of operations that can be meaningfully performed on it, end-users dictate the way they would like data manipulated or presented to them for meaningful processing. More ordinary, naive, and computer illiterate users (including young kindergarten children) or housewives, are becoming database users and their number keeps growing everyday. While querying and transaction processing were the two main functions of database systems, the modality has changed to browsing, searching and exploration of data for an average user.

Expansion along the above three dimensions is further aggravated by the inherent complexity of data and of applications. While database technology mainly dealt with applications in banks, insurance companies, manufacturing and government record-keeping units in the first couple of decades of existence (1965-85), the e-commerce applications have made a lot more data available to a large variety of transactions involving consumers and businesses. Many databases such as product catalogs, travel and entertainment schedules, informative reports, facility descriptions, rules and policy manuals etc are at the disposal of the general public. Opportunities in the entertainment industry, education, healthcare, telemedicine, biotechnology coupled with the growing connectivity afforded by broadband and wireless technologies has opened up an unlimited span of possible applications.

Now let's take a look at database technology trends we can expect in 2002.

Technology Trends
While it is unreasonable to expect any quantum leaps in the form of data handling functions of popular DBMS, Data Warehousing and OLAP products in the span of a year, we can expect incremental improvements in the following areas. All these areas are being actively researched by major database research groups worldwide.

Scalability of Mobile Applications: Currently, mobile versions of RDBMs like Oracle or SQL Server allow mobile users with PDAs, pagers and laptops or notebooks to retrieve data from databases at control sites. There are synchronization tools such as imobile Suite (from Synchrologic), that enable synchronization of client databases with servers when the clients can operate in a disconnected fashion. This technology needs to be scaled substantially, by using groupings of data at the server, clustering of clients into client clusters and possibly doing periodic broadcast or multicast of this information. We have an active research project in this area for the past four years. The results of such research must find their way into products in the coming years.

Middleware Solutions: Existing middleware technologies like CORBA, SOAP, J2EE, Bluetooth, etc. have their limitations in terms of supporting truly distributed applications involving device, data and platform heterogeneity. There are a large number of university research projects in this area. Some of that research is bound to have an impact on commercial middleware standards that will be more generic than what Microsoft and Sun will propose to suit their own XML/C# and Java-based solutions. We are doing a major collaborative research project called SyD (System of Devices) at Georgia Tech and other Georgia State Universities in this area.

Security and Directory Services: The security provisions, especially in wireless protocols such as WAP, leave a lot to be desired. The weakness of the WEP encryption technology used by both 802.11b and Airport of Apple seem to indicate that the Bluetooth protocol has an advantage over them in security. Similarly, directory standards like LDAP lack flexibility and scalability. Development of directory services with backend database support will result in a better overall infrastructure for mobile, wireless applications. We are presently developing a prototype of a flexible directory service called "Communities of Interest Directories" in a research project.

User Interfaces: User interfaces have been given a backseat in database technological developments. With the iPAQ, PALM or RIM type devices, use of (the rather limited) screen real estate, and design of icons and functions becomes extremely important. We can expect many intelligent designs that will mix visual displays with input modes based on drag-and-drop, point-and-click or speech input/output. Use of multiple interaction modes will be important for handicapped users.

All of the above areas have a high potential in terms of technology transfer from university research to actual products.

Industry trends
In the new year, as in the past, the industry is likely to look up to University R&D projects in the US, Canada, Europe, and Japan, to provide the next generation technology solutions. But a number of other topics stand out where industry is likely to take a lead. These improved solutions are the fruits of developmental efforts and improved standards ratified by various consortia.

Integration and Interoperability: This will probably be the key theme for the next three to five years in the IT products industry. Competing efforts by the two giants, Microsoft and Sun, and their proprietary developments like MSIL (Microsoft Intermediate Language) or NET.XML or ADO.NET for Microsoft, and Sun's JCA (Java Connector Architecture), JDBC (Java Database Connection), JNDI (Java Naming and Directory Service), JMS (Java Message Service), will all be undergoing rapid revisions. The J2EE platform has an edge in terms of multi-vendor large-scale community support; also, Java is becoming the predominant language of choice for IT students. For Microsoft-oriented organizations .NET will be more appealing and will provide the flexibility of letting them use whatever language they prefer.

XML Developments: While Sun is putting its entire horsepower behind Java, Microsoft is promoting XML as a panacea to all problems. 'Web Services' is a new industry buzzword that refers to an XML representation of objects, programs and messages available over the Internet for application-to-application communication. Web services technology will get lot of attention in the coming year due to its promise of allowing a data-independent means for coupling disparate systems toward supporting e-services for better productivity. However, inspite of the domain or application specific XML developments (e.g. FpML-Financial Products Markup Language, ebXML-electronic business XML, PMML-Predictive Model Markup Language in data mining), XML is plagued with some basic difficulties. There are no efficient storage, indexing and compression mechanisms yet available, and the nature of the data model and its navigational languages like XQUERY are reminiscent of the hierarchical data model (of IMS) which was comparatively much simpler, efficient, and easy to navigate. Native XML systems like TAMINO (from Software AG) are yet to make an impact in the market. Although Microsoft wants to convince the world that XML is going to replace everything as a standard for data modeling, querying, and interoperability, the fact is it has a long way to go. The excessive complexity and proliferation of concepts and namespaces associated with XML as defined by W3C (Worldwide Web Consortium) does not make it ready for adoption.

Embedded Functionality: Database technology is a late entrant in the embedded systems arena. We can expect small footprint DBMSs, a thinner variant of SQL Anywhere type systems, to become available on PDAs and similar devices. In order to support dynamic distributed applications where both data and code may reside at remote servers and be executed as needed, it is necessary to embed some primitive database functionality in these handheld devices. A similar capability is expected on terms of creating designs with embedded input sensors that monitor activities such as vehicle traffic or the flow of goods on a conveyor and collect the parametric data in databases.

Sensors and activators will be already coupled as input/output devices in future database products. The entire field of "active databases" has produced a large body of research in the last two decades in terms of rules processing. The current implementation of rules in database products is too limited just in the form of triggers. The associated management of constraints is handled through stored procedures. This concept needs to be generalized to handle many complex activities with rules from each domain. A major development is inevitable to support these future monitoring and control operations in intensive care units of hospitals, in agricultural systems, security and fire management applications or in nuclear power plants.

Extensibility/Added Functions to Database Languages: Current developments like SQL3 have added object-oriented functions, including inheritance, to database systems. Their usefulness for the application world is not yet confirmed and the language has become unwieldy. At the same time, more and more applications are demanding temporal, spatial and data mining capabilities. Temporal processing refers to incorporating time as an inherent dimension of data in order to process histories or time series type data (e.g. stock market fluctuations, patient medical history, aircraft maintenance history). The spatial dimension relates to attaching spatial coordinates to data for providing location-based services. Data Mining is an activity of discovering new patterns and relationships that are not apparent from just trial and error querying of the data. All these areas have been topics of extensive research for almost 15-20 years, yet industry has been slow in reacting. A temporal SQL (TSQL) task force has recommended the necessary temporal functionality into SQL; but more enhancements will probably have to come from industry as commonly agreed features for query languages.

Service Applications: The Database area has remained largely transaction-oriented with focus on high throughput transaction performance for services such as banks, insurance companies, airlines, retail industry and manufacturing. But the emergence of the Web and e-commerce applications, ERP systems and supply-chain management systems has opened up a vast array of possible applications. One major area that will see growth is location based services. They range from giving information about hospitals or restaurants in the vicinity based on one's location, to actually helping in emergency fire, traffic accidents, ambulance type services to automatic navigational devices for the blind etc. The GPS sensors will be installed in mobile devices like cell-phones and will open up a vast array of possibilities by combining the wireless connectivity, location information and geographically organized information on the Internet. The variety of domain-specific information brokering services will continue to rise by providing valuable information at some price to the consumer for selective shopping of products and services. This will include financial or insurance products and services as well as a variety of payment and information consolidation services such as generating consolidated statements for all accounts, all mutual funds owned by an individual etc. This will not require any breakthrough in technology, but just clever packaging, integration and display of information.

Long-term issues
In the above paragraphs we described what solutions the industry expects from universities and R&D labs. We also discussed the potential areas where the industry can take a lead in bringing out new functionality for new applications. We would like to end by pointing out some long-term issues that need to be addressed for improving the quality of applications and for improving the user experience. Although products and techniques abound, there are a large number of open problems in the research domain for now.

a. Web personalization: With the explosion of websites, product catalogs etc there is a need for developing better techniques to give users a truly personalized experience. Previous work here includes collaborative filtering that keep databases of user preferences and make recommendations. More work is needed on targeting the right dynamic content to the user and making the experience much more worthwhile, both for the user and the product vendor.

b. Search engines: The state-of-the-art search engines leaves a lot to be desired when a simple input word returns 100,000 plus results for a typical query. Work from machine learning and clustering in information retrieval systems has to be applied to categorize the results or rank them more meaningfully.

c. Affordable OLAP, browsing and mining tools: The current products are prohibitive in cost. More affordable tools are being designed by smaller companies with core functionality for browsing, cubing etc to give users a much better sense of the aggregate data and the trends.

d. Easier system management tools: Eventually, as ordinary people become the major creators of data content and also its consumer, we need them to be able to deal with multiple systems and platforms without the reliance on system administration staff. Intelligent tool support is very necessary in this area.

e. Data quality metrics and tools: This is a big void in the database area as opposed to any other type of products. Very little attention has been given to the measurement and systematic improvement of quality. We hope that industry and organizations will take this up seriously in the years ahead. Otherwise, no fancy technological situations can really rescue an organization from accumulated bad data.

- <Back to Top>-  

Copyright 2001: Indian Express Group (Mumbai, India). All rights reserved throughout the world. This entire site is compiled in Mumbai by The Business Publications Division of the Indian Express Group of Newspapers. Site managed by BPD