Html Query Syntax
Html Query is a self description mechanism that uses [1]JSON to describe the contents of a Html document. Although Html Query can be used with a vocabulary such as [2]microformats, HTML query does not require the author to change the html of a document in any way, an author can just use plain old semantic HTML without adding any extra attributes or elements to accommodate your intended semantics.
The following is an example of a simple Html Query.
{
"prefix": {
"dc": "http://purl.org/dc/elements/1.1/"
},
"select": {
"title": {
"as": "dc:title"
}
},
"from": "http://example.com/"
}
When the query is performed on the following url http://example.com/,
It would result in the following output.
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xml:base="http://example.com/">
<rdf:Description rdf:about="http://example.com/">
<dc:title>Example Web Page</dc:title>
</rdf:Description>
</rdf:RDF>
Try the above example by clicking example.json.
HTML Query Examples
The following examples were created during the development of the HTML Query syntax. Please click one of the following links to view the examples.
Click the link at the bottom of each page that says "Get RDF" to test.
Properties of HTML Query
-
select
The
selectproperty is used to select all values in an HTML Query.A
selectproperty may contain nested select proprties to imply the nesting of rdf and html structures. -
Example:
{ "select" : { ... } } -
from
fromis a url for the document to be queried.The value of
fromshould be an absolute url.Example:
"from": "http://example.com/"
The
fromproperty may be omitted. If thefromproperty is omitted from a query, then the parser sets the value offromto the referring page. -
prefix
prefixcontains a comma separated list of vocabulary prefixes and uri's to be used in the RDF output of a query and in the query itself.A prefix is an abbreviation of a URI.
Prefixes are used instead of using full URI's.
A prefix forms the first part of a uri reference or [3]QName in RDF terms.
Pattern:
"prefix": { "prefix": "uri", ... }Example:
"prefix": { "vcard": "http://www.w3.org/2006/vcard/ns#", ... }A default prefix for the output document may be set using the keyword "value"
Example:
"prefix": { "value": "http://www.w3.org/2006/vcard/ns#", ... }
Keyword Selectors.
Html Query uses four CSS like selectors to navigate keywords of a html document. Selectors are as defined below.
-
element
The selector is an element name.
Example:
"h1" is equal to <h1></h1>
-
.class
The selector is a class name.
Example:
".example" is equal to class="example"
-
#id
The selector is the id of an element
Example:
"#example" is equal to id="example"
-
attribute~=name
The selector contains an attribute name.
Example:
"rel~=example" is equal to rel="example"
Properties of Select
HTML Query Selectors contain six properties to set both input and output values.
Properties
-
about
A URL for what this "keyword" is about.
The "about" property contains a space seperated list of HTML Id's which sets the subject of the keyword in [4]RDF terms.
Pattern:
"about" : { "id": "url", ... }Id's set by the about property are matched with HTML id's on a page, the URL value is used in the output. The about pattern allows different id's on a page to have the same URL value, or each id can have their own unique URL value.
Example:
"about": { "fred": "http://example.com/" }Example HTML:
<div id="fred"> ... </div>
The about property can also accept a boolean value of "false". Booleans in JSON may be unquoted strings. Setting the about property to false prevents a parser from generating an about attribute in the RDF output.
Example:
"about": false
-
as
asis used as both unique identifiers in a query and element names in the RDF output.An "
as" is a "predicate" or "property" in [4]RDF terms.Pattern:
"as": "property"
Example:
"as": "foaf:name"
The pre-defined type uri may contain a list to space seperated labels.
Example:
"as": "foaf:primaryTopic foaf:maker"
-
type
The datatype of a keyword or the datatype of the object in [4]RDF terms.
As well as any standard datatypes such as [5]XML Schema datatypes, HTML Query also supports five Pre-Defined types.
If type is omitted from a keyword the parser defaults to just "text".
Pattern:
"type": "value"
Example:
"type": "http://www.w3.org/2001/XMLSchema#dateTime"
Pre-Defined Types.
-
text
The content is just text or a plain literal in RDF.
Text is extracted in the following order, @datetime, @content, @title if none of these HTML attributes are present the value is the node value.
Example Output:
<label>Text</label>
-
uri
A URI, or simply a URL. When stetting the keyword type to uri, the parser extracts the value in the following order, @src then @href.
Example Output:
<label rdf:resource="http://someurl.com/" />
If neither @src or @href are present the value is @id converted to an absolute relative URL, this allows the author to link to other keyword items in the RDF output.
Example Output:
<label rdf:resource="http://someurl.com/#id" />
-
uriplain
The behaviour of uriplain is the same as uri, A uriplain is outputted as a plain literal, text.
Example Output:
<label>http://someurl.com/</label>
-
xmlliteral
An XMLLiteral string. An
xmlliteralmay contain HTML markup or special characters, if the value does contain markup the value should be converted to XHTML, all elements should use the http://www.w3.org/1999/xhtml XML namespace.Example Output:
<x:label rdf:datataype="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral"> <p xmlns="http://www.w3.org/1999/xhtml">Some text.</p> </x:label>
-
cdata
A character data section.
A cdata section may contain HTML markup or special characters.
Example Output:
<label><![CDATA[<p> Some text.</p>]]></label>
-
-
content
Content contains a space seperated list of HTML Id's, and sets the default content of a keyword. Id's set by the content property are matched with HTML id's on a page. The value of content is used in the RDF output.
Pattern:
"content" : { "id": "value" }It is also possible to set a default content.
Pattern:
"content" "value"
-
multiple
By default the parser ignores multiple values of the same selector, only the fist match of a selector is parsed.
Multiple has a boolean value of "true", this causes the parser to extract multiple instances of the same selector with different values.
Pattern:
"multiple": true
Example:
"rel~=friend": { "multiple": true, ... } -
rev
Rev is a reverse property name.
Rev can be used with any root property (a Selector that contains other keyword's) to create an extra chain before a keyword.
Example:
"select": { ".vcard": { "rev": "knows", "label": "Person", "select": { ..... } } }The above example would result in the following RDF.
<knows> <Person rdf:about="..."> ... </Person> </knows>
Root Selectors
Root Selectors are determined by whether or not a keyword contains further nested select statements. If a keyword does contain nested select statements the Selector is said to be a "root selector", if not then the Selector is said to be a property.
Root Selectors can also have type which resolve to a rdf:parseType. Valid types are:
- collection => rdf:parseType="Collection"
- resource => rdf:parseType="Resource"
- literal => rdf:parseType="Literal"
Example:
"type": "resource"
The author can also use type as a rdf:Type selector, this will add a rdf:Type element to the root selector.
Example:
"type": "http://www.w3.org/2000/10/swap/pim/contact#ContactLocation"
In the absence of a root selector type and a reverse property name all properties contained in a root selector are wrapped in a blank rdf:Description element.
Example:
<vcard:adr>
<rdf:Description>
<vcard:locality>Albuquerque</vcard:locality>
</rdf:Description>
</vcard:adr>
Setting RDF about
In the absence of the about property HTML Query sets the RDF about attribute by selecting HTML values from the selected element or attribute in the following order, @href, @src and @id. If the value is @id then an absolute hash URI compiled from the from url and the value of @id. If a parser fails to select @href, @src or @id then the RDF about attribute is set to the value of from
Blank Nodes may be generated by setting the about property to false.
Linking to a JSON Dataset
A HTML Query for a page can be linked to using the html rel value "transformation". By using rel transformation you are saying the url referenced in the href attribute of a link is a transformation for the referring page. The link to a transformation should also contan a type specifier of "application/json"
Example:
<link rel="transformation" href="http://example.com/my-dataset.json" type="application/json">
Dataset Parsing
This Page supports dataset parsing available at http://weborganics.co.uk/dataset/?url=(+your url). The dataset parser supports transforming your dataset by linking to it in the head of your html document
Example:
http://weborganics.co.uk/dataset/?url=http://weborganics.co.uk/dataset/article.html
You can also parse just a dataset , the from property must be set, this is intended to be used for testing your dataset's.
Example:
http://weborganics.co.uk/dataset/?url=http://weborganics.co.uk/dataset/dataset-article.json
Bookmarklet
There is also a bookmarklet that you can drag to your favourites toolbar.
Bookmarklet: DatasetParse
Download
You can also download the complete source code including all the examples available on this page, and the Dataset Transformr PHP Class from http://weborganics.co.uk/dataset/Dataset_Transformr.rar.
References
- JSON specification. http://www.json.org/
- Microformats. http://microformats.org/
- QName. http://en.wikipedia.org/wiki/QName
- RDF Concepts and Abstract Syntax. http://www.w3.org/TR/rdf-concepts/
- XML Schema Part 2: Datatype's Second Edition. http://www.w3.org/TR/xmlschema-2/
- Hypertext Links in HTML. http://www.w3.org/TR/WD-htmllink-970328#link