The DIMSUM Project: Overview

- D I M S U M -

Project Overview

Several on-going trends in database system development have combined to bring about an increased need for flexibility in query processing strategies for distributed database systems. In particular, the merging of relational and object-oriented database systems and the increased necessity of accessing and correlating heterogeneous data over widely-distributed information sources both raise significant technical problems for query processing that have not been addressed in the current generation of distributed database systems.

The DIMSUM (Distributed Information Management Systems at the University of Maryland) project is developing a flexible distributed query processing architecture that can adapt to the requirements of a wide range of applications and system properties (both static and dynamic). This work entails the revisiting of many fundamental design decisions that have been made in the current generation of relational and object-oriented database systems. The resulting architecture will be able to adaptively choose between query-oriented and navigation-oriented approaches, and make decisions about data placement and replication based on both compile-time and runtime considerations.

The work in this project has been influenced by three main trends:

Object Relational Merger - It has become apparent that systems combining the best aspects of the relational and object-oriented approaches to data management are likely to gain acceptance across a large range of applications. Such hybrid systems will need to efficiently support both coarse-grained query-based and fine-grained navigation-oriented access to data. These two types of access impose different constraints on a distributed database system.
Heterogeneous Data Sources - The need to access data across multiple types of systems has arisen due to the increased connectivity of systems and the increased complexity of the data types that database applications must deal with. Heterogeneous systems provide many challenges to distributed query processing due to the variance in the services offered by the information sources.
Wide-area Query Processing - Improved connectivity and the increasing importance of the Internet have resulted in an explosion of the number and distribution of data sources over which queries must be executed. One salient aspect of wide-area information systems is the large variance in the response time of accessing data from individual information sources, and in the availability of those sources. Unlike the variance due to heterogeneity, which arises among different sources, the variance observed in a wide-area information system occurs with respect to individual information sources. That is, the performance characteristics of a given source can change dynamically, and in unpredictable ways, because the observed performance is dependent on many components of the wide-area system.

The areas we are addressing in this research include:

Analysis of the performance tradeoffs between data-shipping and query-shipping execution strategies, and the development of hybrid approaches that can outperform both of the pure strategies.
Investigation of the impact of support for query-based access on the DBMS architecture, including: caching and memory management decisions, distributed object assembly (i.e., materialization) and techniques for index usage and maintenance.
Detailed analysis of the tradeoffs between logical (i.e., query-based) and physical (i.e, page-based) approaches to client caching, replication, and client-server interaction. The choice between these two approaches has a profound impact on the design of many fundamental system components.
Development of query optimization strategies and adaptive mechanisms to process queries in the presence of cached and/or replicated data, and to make data caching and replication decisions. An important part of this work is the development of heuristics that can implement data placement through adaptive caching in response to observed query access patterns.
Development of query execution strategies that can dynamically adjust to changes in the performance characteristics of the information sources in a wide-area information system.

We are pursuing aspects of this work in collaboration with researchers at the IBM Almaden Research Center, AT&T Bell Laboratories, and INRIA.

This project is supported in part by the National Science Foundation, by a gift from Bellcore, and by an equipment grant from IBM. Members of the project are also supported by Humboldt-Stiftung, INRIA, and Fulbright fellowships.

Last updated Feb. 29, 1996.

DIMSUM Home Page

franklin@cs.umd.edu

Web Accessibility