Html Query Syntax
Html Query is a self description mechanism that uses [1]JSON to describe the contents of a Html document. Although Html Query can be used with a vocabulary such as [2]microformats, HTML query does not require the author to change the html of a document in any way, an author can just describe what already exists on a page without adding any extra attributes or elements to accommodate your intended semantics.
The following is an example of a simple Html Query.
{
"select": {
"from": "http://example.com/",
"prefix": {
"dc": "http://purl.org/dc/elements/1.1/"
},
"where": {
"title": { "label": "dc:title" }
}
}
}
When the query is performed on the following url http://example.com/,
It would result in the following output.
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xml:base="http://example.com/">
<rdf:Description rdf:about="http://example.com/">
<dc:title>Example Web Page</dc:title>
</rdf:Description>
</rdf:RDF>
Try a live example, by clicking example.json.
Html Query uses the following patterns to select keywords and set output parameters of a html document.
"object": {
"selector": {
"property" : "value"
}
}
and...
"object" : {
"property" : "value"
}
Properties of HTML Query
-
select
All Html Queries must begin with select.
Example:
{ "select" : { ... } } -
from
fromis a url for the document to be queried. The value offromshould be an absolute urlExample:
"from": "http://example.com/"
The
fromproperty may be omitted. If thefromproperty is omitted from a query, then the parser sets the value offromto the referring page. -
prefix
prefixcontains a comma separated list of vocabulary prefixes and uri's to be used in the RDF output of a query and in the query itself.A prefix is an abbreviation of a URI. Prefixes are used instead of using full URI's . Prefixes form the first part of a uri reference or [3]QName in RDF terms.
Pattern:
"prefix": { "prefix": "uri", ... }Example:
"prefix": { "vcard": "http://www.w3.org/2006/vcard/ns#", ... }A default prefix for the output document may be set using the keyword "value"
Example:
"prefix": { "value": "http://www.w3.org/2006/vcard/ns#", ... } -
where
wherecontains a list of comma separated html keyword selectors and their output properties.wheremay contain nestedwherestatements. If awherekeyword does contain a nestedwherestatement then the keyword is treated as a "root" value, else the keyword is a property. A property keyword should not contain furtherwherestatements.Pattern of a root keyword that contains a nested keyword:
"where": { "selector": { "property": "value", "where": { "selector" : { "property": "value" } } } }Pattern of a keyword that is a property:
"where": { "selector" : { "property": "value" } }
Keyword Selectors.
Html Query uses four CSS like selectors to navigate keywords of a html document. Selectors are as defined below.
-
element
The selector is an element name.
Example:
"h1" is equal to <h1></h1>
-
.class
The selector is a class name.
Example:
".example" is equal to class="example"
-
#id
The selector is the id of an element
Example:
"#example" is equal to id="example"
-
attribute~=name
The selector contains an attribute name.
Example:
"rel~=example" is equal to rel="example"
Properties of Select
HTML Query Selectors contain six properties to set both input and output values.
Properties
-
about
A URL for what this "keyword" is about. The "about" property contains a space seperated list of HTML Id's which sets the subject of the keyword in [4]RDF terms.
Pattern:
"about" : { "id": "url", ... }Id's set by the about property are matched with HTML id's on a page, the URL value is used in the output. The about pattern allows different id's on a page to have the same URL value, or each id can have their own unique URL value.
Example:
"about": { "fred": "http://example.com/" }Example HTML:
<div id="fred"> ... </div>
The about property can also accept a boolean value of "false". Booleans in JSON may be unquoted strings. Setting the about property to false prevents a parser from generating an about attribute in the RDF output.
Example:
"about": false
-
label
Labels are used as both unique identifiers in a query and element names in the RDF output. A label is a "predicate" or "property" in [4]RDF terms.
Pattern:
"label": "property"
Example:
"label": "foaf:name"
The pre-defined type uri may contain a list to space seperated labels.
Example:
"label": "foaf:primaryTopic foaf:maker"
-
type
The datatype of a keyword or the datatype of the object in [4]RDF terms. As well as any standard datatypes such as [5]XML Schema datatypes, HTML Query also supports five Pre-Defined types. If type is omitted from a keyword the parser defaults to just "text".
Pattern:
"type": "value"
Example:
"type": "http://www.w3.org/2001/XMLSchema#dateTime"
Pre-Defined Types.
-
text
The content is just text or a plain literal in RDF. Text is extracted in the following order, @datetime, @content, @title if none of these HTML attributes are present the value is the node value.
Example Output:
<label>Text</label>
-
uri
A URI, or simply a URL. When stetting the keyword type to uri, the parser extracts the value in the following order, @src then @href.
Example Output:
<label rdf:resource="http://someurl.com/" />
If neither @src or @href are present the value is @id converted to an absolute relative URL, this allows the author to link to other keyword items in the RDF output.
Example Output:
<label rdf:resource="http://someurl.com/#id" />
-
uriplain
The behaviour of uriplain is the same as uri, A uriplain is outputted as a plain literal, text.
Example Output:
<label>http://someurl.com/</label>
-
xmlliteral
An XMLLiteral string. An
xmlliteralmay contain HTML markup or special characters, if the value does contain markup the value should be converted to XHTML, all elements should use the http://www.w3.org/1999/xhtml XML namespace.Example Output:
<x:label rdf:datataype="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral"> <p xmlns="http://www.w3.org/1999/xhtml">Some text.</p> </x:label>
-
cdata
A character data section. A cdata section may contain HTML markup or special characters.
Example Output:
<label><![CDATA[<p> Some text.</p>]]></label>
-
-
content
Content contains a space seperated list of HTML Id's, and sets the default content of a keyword. Id's set by the content property are matched with HTML id's on a page. The value of content is used in the RDF output.
Pattern:
"content" : { "id": "value" }It is also possible to set a default content.
Pattern:
"content" "value"
-
multiple
By default the parser ignores multiple values of the same selector, only the fist match of a selector is parsed.
Multiple has a boolean value of "true", this causes the parser to extract multiple instances of the same selector with different values.
Pattern:
"multiple": true
Example:
"rel~=friend": { "multiple": true, ... } -
rev
Rev is a reverse property name. Rev can be used with any root property (a Selector that contains other keyword's) to create an extra chain before a keyword.
Example:
"where": { ".vcard": { "rev": "knows", "label": "Person", "where": { ..... } } }The above example would result in the following RDF.
<knows> <Person rdf:about="..."> ... </Person> </knows>
Root Selectors
Root Selectors are determined by whether or not a keyword contains further nested where statements. If a keyword does contain nested where statements the Selector is said to be a "root selector", if not then the Selector is said to be a property.
Root Selectors can also have type which resolve to a rdf:parseType. Valid types are:
- collection => rdf:parseType="Collection"
- resource => rdf:parseType="Resource"
- literal => rdf:parseType="Literal"
Example:
"type": "resource"
The author can also use type as a rdf:Type selector, this will add a rdf:Type element to the root selector.
Example:
"type": "http://www.w3.org/2000/10/swap/pim/contact#ContactLocation"
In the absence of a root selector type and a reverse property name all properties contained in a root selector are wrapped in a blank rdf:Description element.
Example:
<vcard:adr>
<rdf:Description>
<vcard:locality>Albuquerque</vcard:locality>
</rdf:Description>
</vcard:adr>
Setting RDF about
In the absence of the about property HTML Query sets the RDF about attribute by selecting HTML values from the selected element or attribute in the following order, @href, @src and @id. If the value is @id then an absolute hash URI compiled from the from url and the value of @id. If a parser fails to select @href, @src or @id then the RDF about attribute is set to the value of from
Blank Nodes may be generated by setting the about property to false.
Linking to a JSON Dataset
A HTML Query for a page can be linked to using the html rel value "dataset" The [6]HTML Link relation "dataset" is a short uri reference to http://weborganics.co.uk/ns/dataset ( this page ). By using rel dataset you are saying the url referenced in the href attribute of a link is a dataset for the referring page. The link to a dataset should also contan a type specifier of "application/json"
Example:
<link rel="dataset" href="http://example.com/my-dataset.json" type="application/json">
Dataset Parsing
This Page supports dataset parsing available at http://weborganics.co.uk/dataset/?url=(+your url). The dataset parser supports transforming your dataset by linking to it in the head of your html document
Example:
http://weborganics.co.uk/dataset/?url=http://weborganics.co.uk/dataset/article.html
You can also parse just a dataset , the from property must be set, this is intended to be used for testing your dataset's.
Example:
http://weborganics.co.uk/dataset/?url=http://weborganics.co.uk/dataset/dataset-article.json
Bookmarklet
There is also a bookmarklet that you can drag to your favourites toolbar.
Bookmarklet: DatasetParse
Examples
The following examples were created during the development of the HTML Query syntax. Please click one of the following links to view the examples.
Click the link at the bottom of each page that says "Get RDF dataset" to test.
- HTML Article, view json
- HTML hCard as FOAF, view json
- HTML hAtom, view json
- HTML hCalendar, view json
- HTML hProduct, view json
- HTML hReview, view json
- HTML Organization, view json
- HTML hCard, view json
- HTML hAudio, view json
The following are "real world" examples performed on live webpages.
References
- JSON specification. http://www.json.org/
- Microformats. http://microformats.org/
- QName. http://en.wikipedia.org/wiki/QName
- RDF Concepts and Abstract Syntax. http://www.w3.org/TR/rdf-concepts/
- XML Schema Part 2: Datatype's Second Edition. http://www.w3.org/TR/xmlschema-2/
- Hypertext Links in HTML. http://www.w3.org/TR/WD-htmllink-970328#link