Html Query
Html Query is a self description mechanism that uses [1]JSON to allow an author to fully describe the content of elements or attributes in an Html document. Although a Html Query can be used with a constrained vocabulary, HTML query does not require the author to change the html of a document in any way by adding explicit attributes or elements as you may have to do with [8]RDFa, [9]Microdata and [2]Microformats.
This is an example of a simple Html Query.
{
"query": {
"output": "ntriples",
"base": "http://example.com/",
"vocab": {
"dc": "http://purl.org/dc/elements/1.1/"
},
"item": {
"title": { "label": "dc:title" }
}
}
}
When the query is performed on the following url http://example.com/,
It would result in the following output in N-Triples.
<http://example.com/> <http://purl.org/dc/elements/1.1/title> "Example Web Page" .
Try a live example, by clicking example.json.
Html Query uses the following patterns to select Items and set output parameters of a html document.
"object": {
"selector": {
"property" : "value"
}
}
and...
"object" : {
"property" : "value"
}
Properties of HTML Query
-
query
All Html Queries must begin with query.
Example:
{ "query" : { ... } } -
output
The default output of a query. The output property may be omitted from a query. If the output property is absent then the parser defaults to RDF. The output choices are:
Example:
"output": "rdfjson"
-
base
The base url of a document. The value of base must be an absolute url.
Example:
"base": "http://example.com/"
The base property may be omitted from a query. If base is absent then base is set to the referring page (if any) else null.
-
vocab
Vocab contains a comma separated list of vocabulary prefixes and uri's to be used in the RDF output of a query and in the query itself.
A prefix is an abbreviation of a URI. Prefixes are used instead of using full URI's . Prefixes form the first part of a uri reference or [3]QName in RDF terms.
Pattern:
"vocab": { "prefix": "uri", ... }Example:
"vocab": { "vcard": "http://www.w3.org/2006/vcard/ns#", ... }A default vocabulary for the output document may be set using the keyword property "value"
Example:
"vocab": { "value": "http://www.w3.org/2006/vcard/ns#", ... } -
item
Item contains a list of comma separated selectors and their output properties. An Item may also contain other nested Items. If an Item contains nested items then the Item is treated as a "root" item, else the Item is a property. A property Item must not contain further Items.
Pattern of a root item that contains a nested item:
"item": { "selector": { "property": "value", "item": { "selector" : { "property": "value" } } } }Pattern of an Item that is a property:
"item": { "selector" : { "property": "value" } }
HTML Query Selectors.
Html Query uses four CSS like selectors to navigate the items of a html document. Selectors are as defined below.
-
element
The selector is an element name.
Example:
"h1" => <h1></h1>
-
.class
The selector is a class name.
Example:
".example" => class="example"
-
#id
The selector is the id of an element
Example:
"#example" => id="example"
-
attribute~=name
The selector contains an attribute name.
Example:
"rel~=example" => rel="example"
Properties of Item
HTML Query Item's contain six properties and one keyword property "value", to set both input and output values.
Properties
-
about
A URL for what this "item" is about. The "about" property sets the subject of the item in [4]RDF terms.
Pattern:
"about" : { "id": "url", ... }Id's set by the about property are matched with HTML id's on a page, the URL value is used in the output. The about pattern allows different id's on a page to have the same URL value, or each id can have their own unique URL value.
Example:
"about": { "fred": "http://example.com/" }Example HTML:
<div id="fred"> ... </div>
The about property can also accept a boolean value of "false". Booleans in JSON may be unquoted strings. Setting the about property to false stops the parser from generating an about attribute in the RDF output.
Example:
"about": false
-
label
Labels are used as both unique identifiers in a query and element names in the RDF output. A label is a "predicate" or "property" in [4]RDF terms.
Pattern:
"label": "property"
Example:
"label": "foaf:name"
-
type
The [5]datatype of an Item , the datatype of the object in [4]RDF terms. HTML Query supports five basic types. If type is omitted from an item the parser defaults to "plain".
Pattern:
"type": "value"
Values.
-
plain
The content is just text or a plain literal in RDF. Text is extracted in the following order, @title, @datetime, @content if neither of these HTML attributes are present the value is the node value.
Example Output:
<label>Text</label>
-
resource
A RDF resource, or simply a URL. When stetting the item type to resource, the parser extracts the value in the following order, @src then @href.
Example Output:
<label rdf:resource="http://someurl.com/" />
If neither @src or @href are present the value is @id converted to an absolute relative URL, this allows the author to link to other items in the RDF output.
Example Output:
<label rdf:resource="http://someurl.com/#id" />
-
resourceplain
The behaviour of resourceplain is the same as resource, A resourceplain is outputted as a plain literal, text.
Example Output:
<label>http://someurl.com/</label>
-
literal
An Literal string. A literal may contain HTML markup or special characters, if the value does contain markup the value is converted to XHTML, all elements are placed in the http://www.w3.org/1999/xhtml XML namespace.
Example Output:
<x:label rdf:datataype="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral"> <p xmlns="http://www.w3.org/1999/xhtml">Some text.</p> </x:label>
-
cdata
A character data section. A cdata section may contain HTML markup or special characters.
Example Output:
<label><![CDATA[<p> Some text.</p>]]></label>
Html Query supports a limited set of [5]XML schema datatype's (XSD), they are string, int, integer, decimal, duration, datetime, date and time.
Example Output:
<label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Some text.</label>
-
-
content
Content sets the default content of an item. Id's set by the content property are matched with HTML id's on a page. The value of content is used in the RDF output.
Pattern:
"content" : { "id": "value" }It is also possible to set a default content by using the "value" keyword property instead of an HTML id.
Pattern:
"content" : { "value": "value" } -
multiple
By default the parser ignores multiple values of the same selector, only the fist match of a selector is parsed.
Multiple has a boolean value of "true", this causes the parser to extract multiple instances of the same selector with different values.
Pattern:
"multiple": true
Example:
"rel~=friend": { "multiple": true, ... } -
rev
Rev is a reverse property name. Rev can be used with any root property (an Item that contains other item's) to create an extra chain before an item.
Example:
"item": { ".vcard": { "rev": "knows", "label": "Person", "item": { ..... } } }The above example would result in the following RDF.
<knows> <Person rdf:about="..."> ... </Person> </knows>
-
value
The value property is the value of something. The value property can be used as a descendant of vocab and content.
Root Items
Root items are determined by whether or not an Item contains further nested items. If an Item does contain nested Items the Item is said to be a "root item", if not then the Item is said to be a property.
Root items can also have types which resolve to a rdf:parseType. Valid types are:
- collection => rdf:parseType="Collection"
- resource => rdf:parseType="Resource"
- literal => rdf:parseType="Literal"
Example:
"type": "resource"
In the absence of a root item type and a reverse property name all properties contained in a root item are wrapped in a blank rdf:Description element.
Example:
<vcard:adr>
<rdf:Description>
<vcard:locality>Albuquerque</vcard:locality>
</rdf:Description>
</vcard:adr>
Setting RDF about
In the absence of the about property HTML Query sets the RDF about attribute by selecting HTML values from the selected element or attribute in the following order, @href, @src and @id. If the value is @id then an absolute hash URI compiled from the base URL and the value of @id.
Linking to a Dataset transformation
A HTML Query for a page can be linked to in the head of your document using the html rel value "dataset" The [6]HTML Link relation "dataset" is a short uri reference to http://weborganics.co.uk/ns/dataset ( this page ). By using rel dataset you are saying the url referenced in the href attribute of a link is a dataset transformation for the referring page.
Example:
<link rel="dataset" href="http://example.com/my-dataset.json" type="application/json">
Dataset Parsing
[7]Transformr supports dataset parsing available at http://transformr.co.uk/dataset/(+your url). The dataset parser supports transforming your dataset by linking to it in the head of your html document
Example:
http://transformr.co.uk/dataset/http://weborganics.co.uk/dataset/article.html
You can also parse just a dataset , the base property must be set, this is intended to be used for testing your dataset's.
Example:
http://transformr.co.uk/dataset/http://weborganics.co.uk/dataset/dataset-article.json
Download DatasetParser
You can download the DatasetParser and HTML Query class from http://github.com/WebOrganics/DatasetParser
Bookmarklet
There is also a bookmarklet that you can drag to your favourites toolbar.
Bookmarklet: DatasetParse
Examples
Please click one of the following links to view the examples.
Click the link at the bottom of each page that says "Get RDF dataset" to test.
References
- [1] JSON specification. http://www.json.org/
- [2] Microformats. http://microformats.org/
- [3] QName. http://en.wikipedia.org/wiki/QName
- [4] RDF Concepts and Abstract Syntax. http://www.w3.org/TR/rdf-concepts/
- [5] XML Schema Part 2: Datatype's Second Edition. http://www.w3.org/TR/xmlschema-2/
- [6] Hypertext Links in HTML. http://www.w3.org/TR/WD-htmllink-970328#link
- [7] Transformr. http://transformr.co.uk/
- [8] RDFa Syntax. http://www.w3.org/TR/rdfa-syntax
- [9] Microdata. http://dev.w3.org/html5/md/