Overview

Components
     Client API
     QPC
     DAP
     Data Server

Plug-And-Play

Catalog

Benefits

Home

MOCHA Architecture

The main purpose of MOCHA is to integrate collections of data sources distributed over wide-area computer networks such as the Internet. MOCHA is a database middleware system that provides client applications with a uniform view and access mechanism to the data collections available in each source. This is realized by imposing a global schema on top of the local schemas that are being used by each individual data source. 

MOCHA is built around the notion that the middleware for a large-scale distributed environment should be self-extensible. A self-extensible middleware system is one in which new application-specific data types and query operators (i.e. user-defined methods) needed for query processing are deployed to remote sites in automatic fashion by the middleware system itself. In MOCHA, this is realized by shipping Java classes containing the new capabilities to the remote sites, where they can be used to manipulate the data of interest. All these Java classes are stored into one or more code repositories from which MOCHA later retrieves and deploys them on a "need-to-do'' basis.

A major goal behind this idea of automatic code deployment is to fill-in the need for application-specific processing components at remote sites that do not provide them. These components are migrated on demand by MOCHA from site to site and become available for immediate use. This approach sharply contrasts with existing middleware solutions, such as application servers and mediator systems,  in which administrators need to manually install all the required functionality throughout the entire system before it can be used in the queries posed by the users.

MOCHA capitalizes on its ability to automatically deploy code in order to provide an efficient query processing service. By shipping code for query operators, such as generalized projections or predicates , MOCHA can generate efficient plans that place the execution of powerful data-reducing operators ("filters") on the data sources. Examples of such operators are aggregates, predicates and data mining operators, which return a much smaller abstraction of the original data. On the other hand, data-inflating operators that produce results larger that their arguments are evaluated near the client. Since in many cases, the code been shipped is much more smaller than the data sets, automatic code deployment facilitates query optimization based on data movement reduction. Notice that since network bandwidth typically is the major performance bottleneck in distributed processing, our approach can reduce query execution time by minimizing the overall time spent on data transfer operations. Again, this is very different from the existing middleware solutions, which perform expensive data movement operations since either all data processing occurs at the integration server, or a data source evaluates only those operators that exist a priori in its environment. 

 

NEXT

 

© 2000 University of Maryland. All rights reserved.


Last update was on June 14, 2000
manuel@cs.umd.edu

Web Accessibility