- D I M S U M -
Project Overview
Several on-going trends in database system development have combined to
bring about an increased need for flexibility in query processing strategies
for distributed database systems. In particular, the merging of relational and
object-oriented database systems and the increased necessity of accessing
and correlating heterogeneous data over widely-distributed information sources
both raise significant technical
problems for query processing that have not been addressed in the current
generation of distributed database systems.
The DIMSUM (Distributed Information Management Systems at the University of
Maryland) project is developing a flexible distributed query processing
architecture that can adapt to the requirements of a wide
range of applications and system properties (both static and dynamic).
This work entails the revisiting of many fundamental design decisions that
have been made in the current generation of relational and object-oriented
database systems. The resulting architecture will be able to adaptively
choose between query-oriented and navigation-oriented approaches, and make
decisions about data placement and replication based on both compile-time and
runtime considerations.
The work in this project has been influenced by three main trends:
- Object Relational Merger - It has become apparent that systems
combining the best aspects of the relational and object-oriented approaches
to data management are likely to gain acceptance across a large range of
applications. Such hybrid systems will need to efficiently support both
coarse-grained query-based and fine-grained navigation-oriented access to
data. These two types of access impose different constraints on a distributed
database system.
- Heterogeneous Data Sources - The need to access data across
multiple types of systems has arisen due to the increased connectivity of
systems and the increased complexity of the data types that database
applications must deal with. Heterogeneous systems provide many challenges
to distributed query processing due to the variance in the services offered by
the information sources.
- Wide-area Query Processing - Improved connectivity and the
increasing importance of the Internet have resulted in an explosion of the
number and distribution of data sources over which queries must be executed.
One salient aspect of wide-area information systems is the large variance in
the response time of accessing data from individual information sources, and
in the availability of those sources. Unlike the variance due to
heterogeneity, which arises among different sources, the variance observed
in a wide-area information system occurs with respect to individual
information sources. That is, the performance characteristics of a given
source can change dynamically, and in unpredictable ways, because the
observed performance is dependent on many components of the wide-area system.
The areas we are addressing in this research include:
- Analysis of the performance tradeoffs between data-shipping and
query-shipping execution strategies, and the development of hybrid
approaches that can outperform both of the pure strategies.
- Investigation of the impact of support for query-based access
on the DBMS architecture, including: caching and memory management decisions,
distributed object assembly (i.e., materialization) and techniques for index
usage and maintenance.
- Detailed analysis of the tradeoffs between logical (i.e., query-based)
and physical (i.e, page-based) approaches to client caching, replication,
and client-server interaction. The choice between these two approaches
has a profound impact on the design of many fundamental system components.
- Development of query optimization
strategies and adaptive mechanisms to
process queries in the presence of cached and/or replicated data, and to
make data caching and replication decisions. An important part of this work
is the development of heuristics that can implement data placement through
adaptive caching in response to observed query access patterns.
- Development of query execution strategies that can dynamically
adjust to changes in the performance characteristics of the information
sources in a wide-area information system.
We are pursuing aspects of this work in collaboration with researchers at
the IBM Almaden Research Center, AT&T Bell Laboratories, and INRIA.
This project is supported in part by the National Science Foundation, by
a gift from Bellcore, and by an
equipment grant from IBM. Members of the project are also supported by
Humboldt-Stiftung, INRIA, and Fulbright fellowships.
Web Accessibility