Table of Contents

  1. Html Query Syntax
  2. HTML Query Examples
  3. Properties of HTML Query
  4. Keyword Selectors
  5. Properties of keyword Selector
  6. Root Selectors
  7. Setting RDF about
  8. Linking to a JSON Dataset
  9. Dataset Parsing
  10. References
  11. Similar Work
  12. Thanks

Html Query Syntax

Html Query is a self description mechanism that uses [1]JSON to describe the contents of a Html document. Although Html Query can be used with a vocabulary such as [2]microformats, HTML query does not require the author to change the html of a document in any way, an author can just use plain old semantic HTML without adding any extra attributes or elements to accommodate your intended semantics.

The following is an example of a simple Html Query.

{
"prefix": {
"dc": "http://purl.org/dc/elements/1.1/"
},
"select": {
"title": {
"as": "dc:title"
}
},
"from": "http://example.com/"
}

When the query is performed on the following url http://example.com/,

It would result in the following output.

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
	 xmlns:dc="http://purl.org/dc/elements/1.1/" 
	 xml:base="http://example.com/">
  <rdf:Description rdf:about="http://example.com/">
    <dc:title>Example Web Page</dc:title>
  </rdf:Description>
</rdf:RDF>

Try the above example by clicking example.json.

[back to contents]

HTML Query Examples

The following examples were created during the development of the HTML Query syntax. Please click one of the following links to view the examples.

Click the link at the bottom of each page that says "Get RDF" to test.

  1. HTML Article, view json
  2. HTML hCard as FOAF, view json
  3. HTML hAtom, view json
  4. HTML hCalendar, view json
  5. HTML hProduct, view json
  6. HTML hReview, view json
  7. HTML Organization, view json
  8. HTML hCard, view json
  9. HTML hAudio, view json

[back to contents]

Properties of HTML Query

  1. select

    The select property is used to select all values in an HTML Query.

    A select property may contain nested select proprties to imply the nesting of rdf and html structures.

  2. Example:

    {
    	"select" : {
    		...
    	}
    }
  3. from

    from is a url for the document to be queried.

    The value of from should be an absolute url.

    Example:

    "from": "http://example.com/"

    The from property may be omitted. If the from property is omitted from a query, then the parser sets the value of from to the referring page.

  4. prefix

    prefix contains a comma separated list of vocabulary prefixes and uri's to be used in the RDF output of a query and in the query itself.

    A prefix is an abbreviation of a URI.

    Prefixes are used instead of using full URI's.

    A prefix forms the first part of a uri reference or [3]QName in RDF terms.

    Pattern:

    "prefix": {
    	"prefix": "uri",
    	...
    }

    Example:

    "prefix": {
    	"vcard": "http://www.w3.org/2006/vcard/ns#",
    	...
    }

    A default prefix for the output document may be set using the keyword "value"

    Example:

    "prefix": {
    	"value": "http://www.w3.org/2006/vcard/ns#",
    	...
    }

[back to contents]

Keyword Selectors.

Html Query uses four CSS like selectors to navigate keywords of a html document. Selectors are as defined below.

  1. element

    The selector is an element name.

    Example:

    "h1" is equal to <h1></h1>
  2. .class

    The selector is a class name.

    Example:

    ".example" is equal to class="example"
  3. #id

    The selector is the id of an element

    Example:

    "#example" is equal to id="example"
  4. attribute~=name

    The selector contains an attribute name.

    Example:

    "rel~=example" is equal to rel="example"

[back to contents]

Properties of Select

HTML Query Selectors contain six properties to set both input and output values.

Properties

  1. about

    A URL for what this "keyword" is about.

    The "about" property contains a space seperated list of HTML Id's which sets the subject of the keyword in [4]RDF terms.

    Pattern:

    "about" :  {
    	"id": "url",
    	...
    }

    Id's set by the about property are matched with HTML id's on a page, the URL value is used in the output. The about pattern allows different id's on a page to have the same URL value, or each id can have their own unique URL value.

    Example:

    "about": {
    	"fred": "http://example.com/"
    }

    Example HTML:

    <div id="fred">
    	...
    </div>

    The about property can also accept a boolean value of "false". Booleans in JSON may be unquoted strings. Setting the about property to false prevents a parser from generating an about attribute in the RDF output.

    Example:

    "about": false
  2. as

    as is used as both unique identifiers in a query and element names in the RDF output.

    An "as" is a "predicate" or "property" in [4]RDF terms.

    Pattern:

    "as": "property"

    Example:

    "as": "foaf:name"

    The pre-defined type uri may contain a list to space seperated labels.

    Example:

    "as": "foaf:primaryTopic foaf:maker"
  3. type

    The datatype of a keyword or the datatype of the object in [4]RDF terms.

    As well as any standard datatypes such as [5]XML Schema datatypes, HTML Query also supports five Pre-Defined types.

    If type is omitted from a keyword the parser defaults to just "text".

    Pattern:

    "type": "value"

    Example:

    "type": "http://www.w3.org/2001/XMLSchema#dateTime"

    Pre-Defined Types.

    1. text

      The content is just text or a plain literal in RDF.

      Text is extracted in the following order, @datetime, @content, @title if none of these HTML attributes are present the value is the node value.

      Example Output:

      <label>Text</label>
    2. uri

      A URI, or simply a URL. When stetting the keyword type to uri, the parser extracts the value in the following order, @src then @href.

      Example Output:

      <label rdf:resource="http://someurl.com/" />

      If neither @src or @href are present the value is @id converted to an absolute relative URL, this allows the author to link to other keyword items in the RDF output.

      Example Output:

      <label rdf:resource="http://someurl.com/#id" />
    3. uriplain

      The behaviour of uriplain is the same as uri, A uriplain is outputted as a plain literal, text.

      Example Output:

      <label>http://someurl.com/</label>
    4. xmlliteral

      An XMLLiteral string. An xmlliteral may contain HTML markup or special characters, if the value does contain markup the value should be converted to XHTML, all elements should use the http://www.w3.org/1999/xhtml XML namespace.

      Example Output:

      <x:label rdf:datataype="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral">
      	<p xmlns="http://www.w3.org/1999/xhtml">Some text.</p>
      </x:label>
    5. cdata

      A character data section.

      A cdata section may contain HTML markup or special characters.

      Example Output:

      <label><![CDATA[<p> Some text.</p>]]></label>
  4. content

    Content contains a space seperated list of HTML Id's, and sets the default content of a keyword. Id's set by the content property are matched with HTML id's on a page. The value of content is used in the RDF output.

    Pattern:

    "content" : {
    	"id": "value"
    }

    It is also possible to set a default content.

    Pattern:

    "content"  "value"
    
  5. multiple

    By default the parser ignores multiple values of the same selector, only the fist match of a selector is parsed.

    Multiple has a boolean value of "true", this causes the parser to extract multiple instances of the same selector with different values.

    Pattern:

    "multiple": true

    Example:

    "rel~=friend": {
    	"multiple": true,
    	...
    }
  6. rev

    Rev is a reverse property name.

    Rev can be used with any root property (a Selector that contains other keyword's) to create an extra chain before a keyword.

    Example:

    "select": {
    	".vcard": {
    		"rev": "knows",
    		"label": "Person",
    		"select": {
    			.....
    		}
    	}
    }

    The above example would result in the following RDF.

    <knows>
    	<Person rdf:about="...">
    		...
    	</Person>
    </knows>

[back to contents]

Root Selectors

Root Selectors are determined by whether or not a keyword contains further nested select statements. If a keyword does contain nested select statements the Selector is said to be a "root selector", if not then the Selector is said to be a property.

Root Selectors can also have type which resolve to a rdf:parseType. Valid types are:

  • collection => rdf:parseType="Collection"
  • resource => rdf:parseType="Resource"
  • literal => rdf:parseType="Literal"

Example:

"type": "resource"

The author can also use type as a rdf:Type selector, this will add a rdf:Type element to the root selector.

Example:

"type": "http://www.w3.org/2000/10/swap/pim/contact#ContactLocation"

In the absence of a root selector type and a reverse property name all properties contained in a root selector are wrapped in a blank rdf:Description element.

Example:

<vcard:adr>
  <rdf:Description>
    <vcard:locality>Albuquerque</vcard:locality>
  </rdf:Description>
</vcard:adr>

[back to contents]

Setting RDF about

In the absence of the about property HTML Query sets the RDF about attribute by selecting HTML values from the selected element or attribute in the following order, @href, @src and @id. If the value is @id then an absolute hash URI compiled from the from url and the value of @id. If a parser fails to select @href, @src or @id then the RDF about attribute is set to the value of from

Blank Nodes may be generated by setting the about property to false.

[back to contents]

Linking to a JSON Dataset

A HTML Query for a page can be linked to using the html rel value "transformation". By using rel transformation you are saying the url referenced in the href attribute of a link is a transformation for the referring page. The link to a transformation should also contan a type specifier of "application/json"

Example:

<link rel="transformation" href="http://example.com/my-dataset.json" type="application/json">

[back to contents]

Dataset Parsing

This Page supports dataset parsing available at https://weborganics.co.uk/dataset/?url=(+your url). The dataset parser supports transforming your dataset by linking to it in the head of your html document

Example:

https://weborganics.co.uk/dataset/?url=https://weborganics.co.uk/dataset/article.html

You can also parse just a dataset , the from property must be set, this is intended to be used for testing your dataset's.

Example:

https://weborganics.co.uk/dataset/?url=https://weborganics.co.uk/dataset/dataset-article.json

Bookmarklet

There is also a bookmarklet that you can drag to your favourites toolbar.

Bookmarklet: DatasetParse

Download

You can also download the complete source code including all the examples available on this page, and the Dataset Transformr PHP Class from https://weborganics.co.uk/dataset/Dataset_Transformr.rar.

[back to contents]

References

  1. JSON specification. http://www.json.org/
  2. Microformats. http://microformats.org/
  3. QName. http://en.wikipedia.org/wiki/QName
  4. RDF Concepts and Abstract Syntax. http://www.w3.org/TR/rdf-concepts/
  5. XML Schema Part 2: Datatype's Second Edition. http://www.w3.org/TR/xmlschema-2/
  6. Hypertext Links in HTML. http://www.w3.org/TR/WD-htmllink-970328#link

[back to contents]

Thanks To

  1. Toby Inkster for his valuable feedback and suggestions.

[back to contents]

WebOrganics 2010.

Semantic Web