The HERMES mediator language can extract information from different domains -- data sources and reasoning systems. An underlying assumption is that each of the domains internally provides a set of operations through which the functionalities of the domain are accessed. Hence, a domain D may be viewed as an abstraction of databases and software packages, and is made up of three components: a set of values, V ; a set F of functions on V ; and a set of relations on the data objects in V . The elements of V may be thought of as the data objects that are being manipulated by the software in question. For example in a numerical computation software package, V , contains the real numbers. In general, values in V may be typed, so that V is composed of a collection of different universes of values. The functions in F take objects in V as input, and return as output, objects from their range. These functions may be thought of as the pre-defined functions existing in the package/domain D that we are seeking to integrate into a mediated system. An example is the function to perform numerical integration in the aforementioned software package. Lastly, the relations are functions over V whose output value is either true or false.
Given a domain D , a domain call is a syntactic expression
where <domainfunction> is the name of the function in F , and <arg1,...,argn> are the arguments that it takes. The informal reading of the call is: in the domain domainname, execute the function domainfunction defined therein on the arguments <arg1,...,argn>.
Before considering the finer details of the mediation language, we first present several examples of domains to ground our discussion in more concrete terms.
Consider the PARADOX database management system. The values in this
domain consists of the collection of all tables, as well as the
individual values that may be stored in the tables. We denote this
domain by PARADOX . The functions F over
PARADOX
include the usual database operations project, select
and join.
Each of these operations take tables as input, and produces
new tables as output. Other operations, for example aggregates,
typically accept tables and attribute names as input, and produce real
values as their output. Below are some specific examples.
The expression
invokes the execution of the project operation, projecting out the
partid field of the relation called parts . Expressing
selects is somewhat more complex. To each boolean condition
C , we
have a corresponding select-C
such that C is imposed on the value
of an attribute, specified respectively by the third and the second
arguments. Thus, to select all tuples in the parts relation in
PARADOX with a cost of over 50, the appropriate domain call may be
expressed:
Operations may be composed. Therefore, getting a list of
partids of of parts that cost over 50 may be expressed through
the domain call
Joins may be similarly expressed. For instance, the statement
joins the relations parts1 and parts2 on the fields
partname and partnom, respectively.
Finally, an example domain call to an aggregate may be
which finds the total cost of all parts that are green.
This syntax has been used uniformly in HERMES to access any relational
database management system, including DBASE, INGRES, and
PARADOX.
In a spatial database, the set of values V consists of the
collection of all coordinates and the values that may be contained in
an ordinary, say relational database. The usual set of operations
include RANGE, which can be used to find all points within a
specified distance of a given location, and horizontal and vertical
slice queries, that can be used to find points in which one of the
axis is within some specified distance of a given location. Some
example domain calls are as follows.
is a range query on a spatial database called 'map' where
given is the given pair of (x,y )coordinates.
is a vertical slice query on 'map'. If
given=(x,y) then
this call returns the set of all points (x',y') represented in the
'map' database such that
Note that the above syntax completely abstracts away the details of
internal representation in the databases. Though this is a useful and
important software engineering technique, we will see later that
the activities of domain integration cannot be done in complete
independence from these internal details. However, in many cases, the
tools that are provided in the domain integration toolkit may make
these tasks less painstaking (cf.
Section 2.4.1).
We use the domains discussed above to elaborate on the mediator
language. The language is rule-based, with Prolog-like syntax.
Access to the various domains integrated in HERMES is achieved through
a small, but fairly general special set of predicates. These
predicates take as input, various domain calls, whose syntax was
introduced in the previous subsections.
In the Paradox example, the query
succeeds if the parts database contains at least one object
whose color is green.
For instance, the atom
succeeds just in case the parts database contains only green
objects.
The formal syntax of a mediator has now been explained.
We refer to each relation constructed using one of =,in or
is as a domain call atom. An annotation is a pair
[M,T] where M is an expression representing a value between
0 and 1 inclusive, and T is an expression representing a set
of non-negative real numbers. Given an atom A, the expression
A:[M,T] is called an annotated atom, where A is the
atomic part. A mediatory clause (or mediatory rule)
is a statement of the form
where each Ai, for i = 1,...,n,
is either an annotated atom, or is a domain call
atom. The first annotated atom A0:[M0,T0] is called the
head of the rule, while the conjunction to the right of the symbol
<- is called the body.
For those readers familiar with annotated logics, the annotations
extend ordinary Prolog clauses with the reasoning capability over
uncertainty and time. Informally, to assert an annotated atom
A:[M,T] is to say intuitively that the relation A is true
with certainty at least M at all time points in T. Thus, the
annotated atom at("john","office"):[0.9,[0800,1700]] says that
between the time points 0800-1700 hours, there is a certainty of over
90% that John is in his office.
The reading of the mediatory clause
above is then: Suppose for each annotated atom Ai:[Mi,Ti] in
the body, Ai is true with certainty at least Mi at all time
points in Ti. Suppose in addition that each domain call atom
in the body holds. Then conclude that A0 is true with certainty at
least M0 at all time points in T0.
The formal semantics of these annotations has been studied in [9,15].
We remark that the language for expressing uncertainty in annotations
contains as constants, the real numbers between 0 and 1. The
language also contains variables and functions interpreted over the
interval [0,1]. Likewise, the second component of each annotation
is a term in a first order language for expressing temporal
information. The constants in the language are sets of real
numbers between 0 and 1. The intersection of the temporal
language, the uncertainty language, and the logical language need not
be empty. This way, a variable belonging to all three languages
may be used to integrate temporal information with uncertainty
information, logical information, and information gleaned from
different databases.
Example 1.
Suppose we are given data stored in a Paradox database called
DB1, a DBASE database called DB2 and a
spatial database called
DB3. Suppose DB1 is a relation containing fields
"qty" and
"name", DB2 is a relation containing fields
"name" and "location",
and DB3 is a spatial data structure who nodes have a field
"location", in which two subfields, "x" and "y",
are specified. Using
the following rule, the query query1(Supplier, Part, Quantity,Factory)
retrieves any Supplier that lies within 50 units of
distance from a given Factory, and the $Supplier
has enough of the
component Part to satisfy Factory's request for a given
Quantity of the component.
According to the rule, once the Part and Quantity
being desired by
a particular Factory are specified, any supplier is either
satisfactory or unsatisfactory. However, in some cases, it may be
desired to evaluate the ``goodness'' of how well a supplier matches
the needs of the Factory based on the distance of the supplier from
the Factory, and the available extra stock that the supplier has.
The following modified rule accomplishes this through the annotation
variables.
Example 2.
Let query2 be defined similarly to
query1, augmented with a goodness of fit between a supplier and the
factory he supplies by evaluating the quantity of overstock the
supplier has, and the distance from the factory of the supplier. The
evaluation is performed by the annotation function EVAL.
Here, Dist and Over are both annotation variables, as well as
variables in the logical language. The complex annotation term
EVAL(Dist,Over) appearing in the annotation of the head of the
clause is assume to return a value between 0 and 1.
We present a few more examples of domains that are currently
integrated in HERMES.
Text database systems can be used to index large amounts of text data.
A text database may be regarded as a domain whose set of values
consists of characters and words.
A simple example of a rules involving text database is as follows.
This rule defines a predicate that accesses a text database called
textdb through the headline function, which has been
implemented in textdb. It takes two arguments: 'usatoday.idx'
is the index file used to index a body of text data (actually, in our
implementation, this
indexes a body of data from on-line versions of the USA Today
newspaper), and Word is the
keyword on which to search. A query to this rule through which a user
may find, for example, the name of the spouse of the person whose
taxes are reported in USA today is:
This query assumes that the information on spouses is kept in a
relational Paradox table called 'spouse'.
Pictorial databases are repositories of images. Suppose a module
existed for querying these databases to determine features present in a
picture. A general architecture for such queries and a formal
theoretical framework for it has been studied by Marcus and
Subrahmanian in 16.
For instance, consider the predicate
p(Person,Rank,Picture) defined below which succeeds just in case
Picture is one of George Bush, with the spouse of a person
whose tax dealings have been reported in the newspaper.
This may be expressed as follows:
Note that the mediator author need not be concerned with how the
pre-defined function called feature is implemented within the
pictorial database -- a variety of implementation possibilities exist
including by an image processing program, or a face recognition
program, or it may have been created by annotating the pictorial data
by a human. The above example has
been implemented in HERMES using the last approach. In addition, we have
incorporated a face recognition system developed at the Vision Lab of the
University of Maryland (we are currently testing how well this
face recognition algorithm works).
2.2.1. The Domain of Relational DBMSs
PARADOX:project('parts',"partid")
PARADOX:select>('parts',"cost",50).
PARADOX:project(select>('parts',"cost",50),"partid").
PARADOX:join('parts1','parts2','partname','partnom')
PARADOX:sum(project(select=('parts',"color","green),"cost"))
2.2.2. The Domain of Spatial DBMSs
|x-x'|<= distance.
2.2.3. A Return to the Mediator Language
in(P,PARADOX:project('parts',color)) & =(P.color, ``green''))
is({"green"},PARADOX:project('parts',color))
A0:[M0,T0] <- A1 & ... & An
in(Loc1,DBASE:project(select=('db2',"name",Supplier),"location))&
in(Loc2,DBASE:project(select=('db2',"name",Factory),"location))&
in(Loc1,SPATIAL:RANGE('db3',Loc2.x,Loc2.y,50)).
in(Loc1,DBASE:project(select=('db2',"name",Supplier),"location")) &
in(Loc2,DBASE:project(select=('db2',"name",Factory),"location")) &
in(Dist,SPATIAL:DIST('db3',Loc2.x,Loc2.y,Loc1.x,Loc2.x)) &
=(Over,SQ.qty - Quantity).
2.2.4. The Domain of Text Databases
=(P.filename, Article).
in(Spouse,PARADOX:project(select=('spouse',Spouse1,Person),Spouse).
2.2.5. The Domain of Pictorial Databases
in(OtherPerson,PICTUREDB:feature(File)) &
in(Spouse,PARADOX:project(select=('spouse',Spouse1,OtherPerson),
Spouse))&
news(Spouse,Article):[1,R] &
news("taxes",Article).
[ Top ]
[ Previous Section ]
[ Next Section ]
Click here to go back to the Hermes homepage .