Note: This is not the latest SHOE specification.
SHOE 0.95
Proposed Specification
Sean Luke
March 29, 1996
Latest version of this document: http://www.cs.umd.edu/projects/plus/SHOE/spec.html
Version 0.90 of this document: http://www.cs.umd.edu/projects/plus/SHOE/spec0.9.html
Table Of Contents
- Introduction: What this specification is all about.
- Terms: Explanatory ideas behind the specification.
- Declaring Ontologies: How to create an ontology for later use.
- Marking Up HTML Documents Using Ontologies: How to use created ontologies in existing HTML documents.
1 Introduction
This specification describes an extension to HTML which provides a way to semantically describe important information about HTML or other World-Wide-Web documents. Currently, there is no effective way to do this, making it difficult for user-agents or search index robots to understand exactly what a document is, why it's there, and its relationships with other documents. In particular, this specification describes:
- A hierarchical classification mechanism for HTML documents and optionally non-HTML documents or subsections of HTML documents.
- A mechanism for specifying relationships between classified elements and other classified elements or specific kinds of data (numbers, dates, etc.)
- An simple way to define ontologies containing rules about the above.
The intent of this specification is to make it possible for user-agents, robots, etc., to gather truly meaningful information about web pages and documents, enabling significantly better search mechanisms and knowledge-gathering.
The general way one goes about this is as follows:
- First, define an ontology describing valid classifications of web objects, and valid relationships between web objects and other web objects or data. This ontology may borrow from other ontologies.
- Annotate HTML pages to describe themselves, other pages, or subsections of themselves, as having attributes as described in one or more ontologies.
We're playing a bit fast-and-loose with the term "ontology" here. In this specification, "ontology" simply means an ISA hierarchy of classes/categories, plus a set of atomic relations between these categories. Categories inherit relations defined for parent categories. This specification does not as yet define any other forms of relationships (transitive closures, inverses, negations, etc.).
User agents following this specification should be aware that assertions made by HTML pages are not facts, but claims. I.e., if element x claims that element y is related with relation r to element z, then the user-agent should not be entering r(y,z) into its database (i.e., "Now I know that y is related to z with the relationship r!!"). Instead, it should be entering something along the lines of r(x,y,z) into its database (i.e., "x is claiming that y is related to z with relationship r."). This is an important distinction: it's perfectly fine for HTML pages out there to be making completely false claims; one shouldn't simply accept them as truth. For similar reasons, HTML pages can only make assertions, not retractions.
2 Terms
Terms not described here may be found in the HTML specification.
- Category
- An element under which HTML page instances or subinstances can be classified. Category names are element names, and may be prefixed. Categories may have parent categories (with ISA links). Categories define inheritance: if an instance is classified under a category, it is eligible to be in the domain or range of relations defined for that category or any of its parent (or ancestor) categories.
- Data
- Data which can be placed in the domain or range of a relationship but is not an instance. Data must be of the following types:
- Strings (STRING type keyword)
- HTML String Literals, as defined in the HTML 2.0 specification.
- Numbers (NUMBER type keyword)
- Numerical constants of the form 0|([+|-|]['.'digit*|0'.'digit*|non-zero-digit digit*['.'digit*|]] [([e|E][+|-]non-zero-digit digit*)]) I.e., numbers like "2", "2.0", "-1.42350123e+4", etc.
- Dates (DATE type keyword)
- Date/Timestamps following RFC 1123, as shown in section 3.3.1 of the HTTP/1.0 specifiction.
- Booleans (TRUTH type keyword)
- HTML String Literals of the form "YES" or "NO", case-sensitive.
- Categories (CATEGORY type keyword)
- Category names.
- Relationships (RELATION type keyword)
- Relation names.
- Element
- A category or relationship name, or one of the following reserved keywords (all caps): STRING, NUMBER, DATE, TRUTH, CATEGORY, or RELATION. Element names are case-sensitive, and may contain only letters, digits, or hyphens.
- Instance
- An element which may be classified under zero or more categories, and included in the domain or range of relationships (along with other forms of data). Some instances ("Page Instances") are associated with World-Wide-Web documents, and more typically, HTML documents. Other instances ("subinstances") are associated with subsections of HTML page instance documents. Instances form the most common data entities in databases built up from this specification.
- Key
- A string which uniquely defines an HTML Page Instance or a subinstance. Determining a key (and hence the instance in question) is sometimes difficult. Documents which do not or cannot follow this spec do not have unique keys. Therefore, the unique key of such a document is defined to be an absolute URL of the document. Unfortunately, there may be several absolute URLs for a document, hence many "unique" keys. This cannot be helped.
HTML Documents which do follow this spec do in fact have unique keys, provided in the documents themselves. These keys are either given in the Page Instance declaration, or in individual subinstance declarations. It is up to you to decide on the keys for your documents. For page instances, the proper method is to use for the unique key a single absolute URL for the document. For example, the document located at http://www.cs.umd.edu/ may have a unique key of "http://www.cs.umd.edu/", or perhaps of "http://mimsy.cs.umd.edu/", or perhaps "http://www.cs.umd.edu:80". Choose a key based on a URL accessible by as wide an Internet community as is possible.
Each subinstance must have a key unique to themselves and different from all subinstances and documents in the World Wide Web, including that of their parent page instance (their parent document). To determine keys for subinstances, you may add to the page instance's unique key (of the subinstance's HTML document) a pound-suffix such as "#key1" or "#subinstance-1", etc. It's fine (and considered good style) if this unique key corresponds with an actual URL subreference within the document.
- Ontology
- As defined in this specification, a description of valid classifications for HTML page instances and subinstances, and a description of valid relationships between such instances and other instances, or with specific types of data (strings, numbers, dates, boolean values, categories, or relationships).
- Prefix
- A small string attached at the beginning of an element, separated with a period ("."). Prefixes may also be attached to already-prefixed elements, forming a prefix chain. A prefix indicates the ontology from which the element (or prefixed element) following it is defined.
- Relation (Relationship)
- An element which defines a relationship between two other elements. Relation names are element names, and may be prefixed. Relations may be between two instances, or between an instance and data, or (improperly) between two data elements. In this specification, all relations are binary relations. Relations have a domain (the element the relation is "from") and a range (the element the relation is "to").
- Rule
- A formal rule in an ontology defining valid classifications (categories) or valid relationships that can be asserted.
- Unique Name
- A string which uniquely defines an ontology. Unique Names are different from Keys in that they do not uniquely define instances but rather the ontologies which the instances may use. Further, several different versions of an ontology may have the same unique name so long as they have different Version numbers.
- Version (Version Number)
- A string which describes the version of an ontology. Versions are case-sensitive, and may contain only letters, digits, or hyphens.
3 Declaring Ontologies
Before an HTML document can be marked up with classifications and relationships, it must have one or more available ontologies from which to draw these classifications and relationships. This chapter describes the HTML tags which allow one to create such an ontology.
This chapter contains:
- Defining An Ontology: Creating and ending an ontology.
- Adding Ontology Declarations: Describing what classifications and relationships are valid in your ontology.
3.1 Defining An Ontology
An HTML document may contain any number of ontology definitions. Each definition provides a unique name and version number for the ontology, which together must be unique from all other ontology definitions that do not describe the exact same ontology as this (i.e., that aren't the same ontology).
This section contains:
- Declaring An Ontology Definition: Creating your ontology.
- Extending An Existing Ontology: How to avoid re-inventing the wheel.
3.1.1 Declaring An Ontology Definition
Ontology definitions must contain a unique name different from all other ontology definitions that aren't exactly the same. Further, ontology definitions should be accompanied with a version which distinguishes the definition from previous versions of the definition. If the ontology completely subsumes certain previous versions (it contains all the rules defined in those versions), it may declare itself to be backward-compatible with those versions. To begin an ontology definition, use:
<ONTOLOGY "ontology-unique-name"
VERSION="Version"
BACKWARD-COMPATIBLE-WITH="Version List">
- "ontology-unique-name" (mandatory)
- The ontology's unique name.
- VERSION (mandatory)
- The ontology's version.
- BACKWARD-COMPATIBLE-WITH
- A whitespace-delimited list of previous versions which this ontology subsumes.
To end an ontology definition, use (after all rules and extensions for the ontology):
</ONTOLOGY>
3.1.2 Extending An Existing Ontology
An ontology may be declared to extend one or more existing ontologies. This means that it will use elements in those ontologies in its own rules. To distinguish between those elements and its own elements, an ontology must provide a unique prefix for each ontology it extends, which will be prefixed to elements borrowed from each particular ontology whenever they are referred to. To declare that this ontology is extending another ontology, use:
<ONTOLOGY-EXTENDS "ontology-unique-name"
VERSION="Version"
PREFIX="Prefix"
[URL="URL"]>
- "ontology-unique-name" (mandatory)
- The extended ontology's unique name, as given in the ontology itself (that is, in its own HTML document).
- VERSION (mandatory)
- The extended ontology's version, as given in the ontology itself (that is, in its own HTML document).
- PREFIX (mandatory)
- The prefix you are assigning the extended ontology. All categories and relations from this ontology which are used in this document must be prefixed with this prefix. With respects to this document prefix must be different than all prefixes declared with either <USE-ONTOLOGY ...> or <ONTOLOGY-EXTENDS ...> tags.
- URL
- A URL that points to a document (preferably the official document) which contains the extended ontology.
3.2 Adding Ontology Declarations
Once an ontology is defined, we must populate it with valid classification and relationship rules.
This section contains:
- Declaring Classification Rules: Defining what instances may be classified as.
- Declaring Relationship Rules: Defining how instances may create relationships with each other and with data.
- Renaming Rules: How to rename classification and relationship rules to get rid of prefixes.
3.2.1 Declaring Classification Rules
Inside an ontology definition, an ontology may declare various new categories which instances can belong to. Categories should be subcategories (ISA) of one or more parent category. To declare a new category, or to add new parent categories for a category, use:
<ONTDEF CATEGORY="category-name"
[ISA="parent-category-list"]>
- CATEGORY (mandatory)
- The newly declared category or the category which is being given more parent categories.
- ISA
- A whitespace-delimited list of categories to define as parent categories of this category. Short for "is a".
3.2.2 Declaring Relationship Rules
Inside an ontology definition, an ontology may declare various new valid relationships between category instances or between category instances and data. To declare a relationship, use:
<ONTDEF RELATION="relation-name"
ARGS="element-list">
- RELATION (mandatory)
- The newly declared relationship name.
- ARGS (mandatory)
- The arguments of the relation. This should be a whitespace-delimited list of exactly two elements (this specification currently supports only binary relations). The first element defines the domain of the relationship, and the second element defines the range of the relationship. Elements can be either declared categories, or the following keywords representing various kinds of data elements (all caps):
- STRING
- Strings.
- NUMBER
- Numbers.
- DATE
- Dates and timestamps.
- TRUTH
- Boolean values (truths) of the form "YES" or "NO".
- CATEGORY
- Category names. This establishes a relationship not with category instances but with a categories themselves.
- RELATION
- Relationships. This establishes a relationship not with instances but with other relationships.
The last two elements are rare and should only be used in special circumstances.
3.2.3 Renaming Rules
To reduce the number of prefixes, an ontology is permitted to rename a category or relation reference to another name, so long as this name is not used in any other reference in the ontology. For example, an ontology could rename the category "cs.junk.person" (i.e., a category "person" declared in an ontology extended [with the prefix "junk"] by an ontology which is extended [with the prefix "cs"] by our ontology document) to simply "person", so long as "person" is not defined elswhere in the ontology to mean something else.
Ontologies are not permitted to rename (or rename elements to) the following keywords: STRING, NUMBER, DATE, TRUTH, CATEGORY, or RELATION. To rename an element, use:
<ONTDEF RENAME="element-name"
TO="new-element-name">
- RENAME (mandatory)
- The element's old name.
- TO (mandatory)
- The element's new name.
4 Marking Up HTML Documents Using Ontologies
Any HTML document can be marked up, using existing ontologies, to declare itself or subparts of itself as belonging to certain categories, being associated with data, or having explicit relationships with other documents.
This chapter contains:
- Instantiation: Declaring that the document uses the specification.
- Using an Ontology: Declaring that the document uses one or more ontologies while classifying or declaring relationships.
- Classification: Using an ontology and declaring categories for a document.
- Declaring Relationships: Declaring relationships between the document and other data or documents.
4.1 Instantiation
HTML documents which follow this spec must declare themselves "page" instances and provide a unique key for themselves. Additionally, these documents may declare subsections of themselves as "sub-instances", each with a unique key.
This section contains:
- Declaring A Page Instance: Indicating that a HTML document uses this specification.
- Declaring A Subinstance: Declaring that a subsection of the HTML document has unique characteristics of its own.
4.1.1 Declaring a Page Instance
A page instance is an HTML document that is marked up using this specification. Each page instance must have a unique key. By default, all instances are automatically of the category Page. To declare the document to be a page instance, you must add the following text to the HEAD section of the HTML document:
<META HTTP-EQUIV="Instance-Key" CONTENT="Key">
Replace "Key" with the page instance's unique key.
To be conformant with this specification, a document must include this declaration.
4.1.2 Declaring a Subinstance
A subinstance is a section of an HTML document that has been declared to be an instance of one or more instance categories, apart from those declared for the document (page instance) itself.
By default, all subinstances are of the category PageSubinstance, and automatically have the parentPage relationship linking them to their parent page instance. Subinstances should not be nested--that is, a subinstance should not be declared inside another subinstance. To declare the start of a subinstance, use the INSTANCE tag:
<INSTANCE "Key">
- "Key" (mandatory)
- The unique key for the instance.
To mark the end of the section of the document which this subinstance covers, use:
</INSTANCE>
4.2 Using An Ontology
Before you can classify documents or establish relationships between them, you'll need to define exactly which ontologies these classifications and relations are coming from--and associate with each of these ontologies a prefix unique to the ontology with respects to this document.
This section contains:
- Declaring Ontology Usage: Indicating that you intende to use a specific ontology in your markup.
4.2.1 Declaring Ontology Usage
An HTML document may declare that is using as many ontologies as it likes, as long as each ontology has a unique prefix in the document. To declare that you will be using a specific ontology in your later classifications or relationships, use:
<USE-ONTOLOGY "ontology-unique-name"
VERSION="Version"
PREFIX="Prefix"
[URL="URL"]>
- "ontology-unique-name" (mandatory)
- The ontology's unique name, as given in the ontology itself (that is, in its own HTML document).
- VERSION (mandatory)
- The ontology's version, as given in the ontology itself (that is, in its own HTML document).
- PREFIX (mandatory)
- The prefix you are assigning the ontology. All categories and relations from this ontology which are used in this document must be prefixed with this prefix. With respects to this document prefix must be different than all prefixes declared with either <USE-ONTOLOGY ...> or <ONTOLOGY-EXTENDS ...> tags.
- URL
- A URL that points to a document (preferably the official document) which contains this ontology.
4.3 Classification
All instances may be classified, that is, they may be declared to belong to one or more categories in an ontology.
This section contains:
- Declaring Categories: Classifying an instance with a category.
4.3.1 Declaring Categories
An instance may declare itself or another instance to belong to one or more categories, using the CATEGORY tag:
<CATEGORY "prefixed.category.list"
[FOR="Key"]>
- "prefixed.category.list" (mandatory)
- A whitespace-delimited list of categories, each prefixed by the appropriate ontology from which it was derived. This is the list of categories the instance is declared to belong to.
- FOR
- Contains the key of the instance which is being declared to belong to these categories.
FOR is not mandatory. Therefore, to determine the key of the instance which will belong to these categories, follow the following rules:
- If FOR is not declared, then the key is assumed to be that of the enclosing subinstance, or (if there is no enclosing subinstance) the page instance.
- If FOR is declared, then it provides the key.
4.4 Declaring Relationships
Instances may declare relationships between elements (an element is an instance or some kind of data, like a string or number). There are three ways this can be done: relationships can be declared explicitly, wrapped around body text, or embedded in links.
This section contains:
- Explicitly Declaring Relationships: The default mechanism for declaring relationships.
- Marking Up Text Relationships: Wrapping relationships around body text.
4.4.1 Explicitly Declaring Relationships
An instance may explicitly declare relationships between two elements:
<RELATION "prefixed.relationship.list"
[FROM="Key"]
[FROM-TYPE="Type"]
[TO="Key"]
[TYPE="Type"]>
- "prefixed.relationship.list" (mandatory)
- A whitespace-delimited list of relationships, each prefixed by the appropriate ontology from which it was derived. This is the list of relationships declared between the FROM and TO elements.
- FROM
- Declares the element in the range of the relationships. This element may be an instance, or it may be a string, a number, a date, a boolean, a category, or a relationship.
- FROM-TYPE
- Provides the type of the FROM element, one of: "STRING", "NUMBER", "TRUTH" (booleans), "DATE", "CATEGORY", "RELATION", or "INSTANCE". The default is assumed to be "INSTANCE".
- TO
- Declares the element in the domain of the relationships. This element may be an instance, or it may be a string, a number, a date, a boolean, a category, or a relationship.
- TYPE
- Provides the type of the TO element, one of: "STRING", "NUMBER", "TRUTH" (booleans), "DATE", "CATEGORY", "RELATION", or "INSTANCE". The default is assumed to be "INSTANCE".
FROM is not mandatory. Therefore, to determine the key of the instance in the range of the relationships (the "FROM" position), follow the following rules:
- If FROM is not declared, then the key is assumed to be that of the enclosing subinstance, or (if there is no enclosing subinstance) the page instance.
- If FROM is declared, then it provides the key.
TO is not mandatory. Therefore, to determine the key of the instance in the domain of the relationships (the "TO" position), follow the following rules (similar to those given for FROM):
- If TO is not declared, then the key is assumed to be that of the enclosing subinstance, or (if there is no enclosing subinstance) the page instance.
- If TO is declared, then it provides the key.
4.4.2 Marking Up Text Relationships
It's possible to wrap existing HTML text and declare it to be a relationship. This is done by:
<ATTRIBUTE "prefixed.relationship.list"
[TYPE="Type"]>
Attribute Text
</ATTRIBUTE>
This is functionally the same as declaring
<RELATION "prefixed.relationship.list"
TO="Attribute Text"
[TYPE="Type"]>
TYPE declares the type of the enclosed text. If no TYPE is provided, it is assumed to be STRING. Note this is different from RELATIONs, where the TYPE is assumed by default to be INSTANCE; this is because typical data wrapped in an ATTRIBUTE will commonly be strings (people's names, terms, etc.).
In fact, because instance key names cannot be uniquely wrapped as text, the only valid TYPEs for ATTRIBUTE-wrapped text are "STRING", "NUMBER", "TRUTH", "DATE", "CATEGORY", and "RELATION".
Web Accessibility