|
MOCHA
MOCHA (Middleware Based On a Code SHipping
Architecture) is a novel database middleware system designed to
interconnect hundreds of data sources distributed over a wide area
network. MOCHA is built around the notion that the middleware for a
large-scale distributed environment should be
self-extensible. A self-extensible middleware system is one in
which new application-specific functionality needed for query
processing is deployed to remote sites in automatic fashion by
the middleware system itself. In MOCHA, this is realized by shipping
Java classes containing new capabilities to the remote sites, where it
can be used to manipulate the data of interest. All the Java classes
are stored into one or more {\em code repositories} from which MOCHA
later retrieves and deploys them on a ``need-to-do'' basis. A major
goal behind this idea of automatic code deployment is to fill-in the
need for application-specific processing components at remote sites
that do not provide them. These components are migrated on demand by
MOCHA from site to site and become available for immediate use. This
approach sharply contrasts with existing middleware solutions, in
which administrators need to manually install all the required
functionality throughout the entire system.
MOCHA capitalizes on its ability to automatically deploy code in order
to provide an efficient query processing service. By shipping code
for query operators, MOCHA can generate efficient plans that place the
execution of powerful data-reducing operators ("filters") on
the data sources. Examples of such operators are aggregates,
predicates and data mining operators, which return a much smaller
abstraction of the original data. On the other hand,
data-inflating operators that produce results larger that their
arguments are evaluated near the client. Since in many cases, the
code been shipped is much more smaller than the data sets, automatic
code deployment facilitates query optimization based on data movement
reduction. Notice that since network bandwidth typically is the major
performance bottleneck in distributed processing, our approach can
reduce query execution time by minimizing the overall time spent on
data transfer operations. Again, this is very different from the
existing middleware solutions, which perform expensive data movement
operations since either all data processing occurs at the integration
server, or a data source evaluates only those operators that exist
a priori in its environment.
For more information on MOCHA, go to www.cs.umd.edu/projects/mocha MOCHA Papers
MOCHA People
|
|
|