This is the FAQ for all things SHOE. If you have a question that is
not covered here or would like to add a question to the list, please
contact Jeff Heflin (heflin@cs.umd.edu).
1.1 What is SHOE?
What SHOE is...
SHOE is an HTML-based knowledge representation language. SHOE is a superset of HTML which adds the tags necessary to embed arbitrary semantic data into web pages.
SHOE tags are divided into two categories. First, there are tags for constructing ontologies. SHOE ontologies are sets of rules which define what kinds of assertions SHOE documents can make and what these assertions mean. For example, a SHOE ontology might say that a SHOE document can declare that some data entity is a "dog", and that if it does so, that this "dog" is permitted to have a "name". Secondly, there are tags for annotating web documents to subscribe to one or more ontologies, declare data entities, and make assertions about those entities under the rules proscribed by the ontologies. For example, a SHOE document subscribing to the SHOE ontology above might then declare that it's all about a dog named "Fido".
SHOE was designed with the needs of the web in mind. It has limited semantics to make it possible to handle large amounts of data. However, simple database semantics are not enough for web data; SHOE provides true knowledge-base semantics. It has a variety of mechanisms that try to deal with the fact that the data out there is distributed and under no one's total control.
SHOE can be used to embed data from a variety of sources and for a variety of purposes. It is not intended for any one particular function. However, SHOE is primarily meant to make it possible for web robots and intelligent agents to finally make a dent in making all our lives a little easier.
What SHOE Isn't...
SHOE is not just a meta-content language. SHOE provides a relatively rich level of semantics and abilities, which enable web designers to embed documents not only with information about the overall "content" of those documents but any arbitrary information at all. SHOE also allows agents to make automatic inferences about the data they learn, provides a hierarchical categorization scheme, and a sophisticated ontology mechanism designed specifically for the web needs. SHOE tags can be used for a wide range of agent-based functions.
SHOE is purposely not a verbose knowledge-representation system. The full semantic expressiveness found in languages such as KIF are inappropriate for SHOE because their computational complexity is too high. SHOE attempts to provide as rich expressivity as possible while keeping in mind that there's a tremendous amount of data out there.
SHOE does not have any pre-defined ontologies, categories, relationships, or inferences. SHOE is a language in which categories, relationships, attributes, inferences, etc. can be defined by ontologies, but SHOE itself does not define them. This is the job of ontology designers for specific tasks or domains. However, the SHOE project does offer some initial ontologies to start things rolling, though we hope these ontologies will ultimately be superseded by other more widely-accepted ontologies.
1.2 So what's so bad with the current search systems?
Okay. We'll start with the standard example. Suppose that you have available to you the following array of state-of-the-art web navigation systems:
- Word-based (or phrase-based) search engines like Lycos or Excite.
- Painstaking human-made web catalogs like Yahoo.
- Resource-indexing mechanisms like Aliweb.
Now suppose that you are searching the web for the home pages of a Mr. and Mrs. Cook, whom you met at a computer conference last year. You don't remember their first names, but you do recall that both work for an employer associated with the massive ARPA funding initiative 123-4567 (this initiative doesn't really exist, but you get the idea). Now, if you had a database with all of the relevant facts stored in it, and a reasonably decent query language, it'd be pretty easy to construct a query that asks for exactly what you want. Here it is in a pseudo-logic form.
Find web pages for all x, y, and z such that
x is a person, y is a person, z is an organization where:
lastName(x,"Cook") and lastName(y,"Cook") and
employee(z,x) and employee(z,y) and
marriedTo(x,y) and involvedIn(z,"ARPA 123-4567")
So you start the web search. Using an existing man-made web catalog (like Yahoo), you can find ARPA's home page but learn that hundreds of subcontractors and research groups are working on initiative 123-4567. Searching existing web indices (AltaVista, for example) for "Cook" yields thousands of pages about cooking (in fact, AltaVista returns over 200,000 responses--try it!). Searching for "ARPA" and "123-4567" provides you with hundreds and hundreds of hits about the popular initiative. Unfortunately, searching for "Cook" and the initiative yields nothing: apparently neither person lists the initiative on his or her web page. Wandering the web on your own is fruitless.
The problem with word indices is that they associate the syntax of a word with its meaning; there's no way in general for a word index to look at the word "Cook" on a web page and realize that it's about Cook County, or about cooking, or about a person named Cook.
The problem with hand-made web catalogs like Yahoo is that the web is growing so fast, and so much information is out there, that the humans at Yahoo can't possibly keep up.
The problem with resource-indexing mechanisms like Aliweb or MCF/HotSauce is that the kind of information they store is not general enough (so far). While it's certainly possible to use the languages developed for these systems to do similar things to SHOE, at this point in time these systems cannot describe the information necessary to solve the query above, namely categories, relations, and inferences from ontologies.
Here's some more fun (and impossible) queries:
- Find me an internet provider who services my area and provides the most bandwidth I can use with my modem for the lowest flat fare.
- Find a four-year undergraduate college or university in a state bordering Iowa which offers Japanese classes, a computational biology major, and an ROTC program.
- I'm looking for a friend. Find someone who's a lot like myself, based on what I say about myself on my home page.
1.4 Why not Natural Language Processing (NLP) systems?
There are some very impressive artificial intelligence (NLP) programs out there which attempt to read English sentences and figure out what they actually mean. Many people have suggested using these programs for figuring out web pages. In the ideal world, an intelligent agent robot could go out and assist you by reading HTML pages on its own, figuring out the information contained in them, and reporting back to you.
Unfortunately, the ideal world won't be here for some time. Natural language processing still has a very long way to go, with tremendous hurdles to overcome. Additionally, the web was not only written in a human-readable language (usually English) but in a human-vision-oriented layout (HTML with tables, frames, etc.) and with human-only-readable graphics. State-of-the-art natural language technology might help with the first problem, but nothing can currently handle the other two. Don't expect robots using this technology to be of real help any time soon.
1.8 What are SHOE's semantics in general?
SHOE's semantics are intentionally very simple.
SHOE Ontologies declare:
- Classifications (categories) for data entities. Classifications may inherit from other classifications ("Dogs are Animals").
- Valid relationships between data entities and other data entities or simple data (strings, numbers, dates, booleans). Arguments for relationships are typed, either by the simple data that can fill the argument, or with the classification a data entity must fall under in order to fill an argument ("Dogs can chase cats").
- Inferences in the form of horn clauses with no negation ("If a person works for an organization, that person automatically works for any organization the organization is a sub-part of").
- Inheritance from other ontologies: ontologies may be derived from or extend zero or more outside ontologies ("The SPCA ontology extends the common Library of Congress ontology").
- Versioning. Ontologies may extend previous ontology versions.
HTML pages with embedded SHOE data may:
- Declare arbitrary data entities. Usually, one of these entities is the web page itself.
- Declare the ontologies which they will use when making declarations about entities ("I'm using the 'Pets' ontology promulgated by the SPCA").
- Categorize entities ("This entity is a dog").
- Declare relationships between entities or between entities and data ("This entity likes to chase that entity", "This entity's name is 'Fido'").
SHOE allows n-ary relations, horn clause inference, simple inheritance in the form of classification, multi-valued relations, and a conjunctive knowledge base. It does not currently allow negation, disjunction, or arbitrary functions and predicates.
SHOE attempts to make it difficult for entities to pretend to be other entities by providing an easily verifiable key scheme based on URLs.
Agents using SHOE should assume that declarations made by entities are claims of those entities, not simple facts. If ten people are claiming to be Marilyn Monroe's lost daughter, a SHOE agent shouldn't be storing the "fact" that Marilyn has ten children.
1.10 Is SHOE XML-compliant?
The short answers is yes, the XML version of the SHOE DTD can be found
here.
However, in order to keep XML simple, practices that were common in
many existing SGML applications (including HTML) are not allowed in XML.
To include XML-compliant SHOE in a web page, one must first make sure
that the HTML is XML compliant. See the W3C's recommendation
XHTML 1.0: The Extensible
HyperText Markup Language for guidelines on this process.
To include SHOE in XHTML markup, simply follow the guidelines set forth in
the W3C's Namespaces
in XML Recommendation. Essentially, this means insert the following
at the desired location in the document:
<shoe xmlns="http://www.cs.umd.edu/projects/plus/SHOE/" version="1.0">
and include the desired ONTOLOGY or INSTANCE tags between this tag and
a </shoe> tag.
Due to the differences between SGML and XML, SHOE XML must obey
more restrictions than the original SHOE syntax.
First, all empty elements,
i.e., elements which have no content and no end tag, must end with a '/>'
instead of a '>'. Specifically, this applies to the USE-ONTOLOGY,
DEF-CATEGORY, DEF-ARG, DEF-RENAME, DEF-CONSTANT, DEF-TYPE, CATEGORY,
and ARG elements. Second, no attribute minimization is allowed.
Therefore, when specifying VAR or CONST within the categories or
arguments in an inference rule, the attribute name USAGE must be explicitly
provided, e.g. USAGE="VAR" instead of VAR. Third,
since XML is case-sensitive, all element and attribute names must be
in lower case. Finally, all attribute values must always be quoted,
including those which are numeric as well as the "FROM" and "TO" keywords.
Future
versions of SHOE are likely to recommend that all SHOE documents follow
these rules.
SHOE can also be used independently of XHTML. A non-embedded SHOE XML
document must begin with the following lines:
<?xml version="1.0"?>
<!DOCTYPE shoe SYSTEM
"http://www.cs.umd.edu/projects/plus/SHOE/shoe_xml.dtd">