Table of Contents

  1. Html Query Syntax
  2. Properties of HTML Query
  3. Keyword Selectors
  4. Properties of keyword Selector
  5. Root Selectors
  6. Setting RDF about
  7. Linking to a JSON Dataset
  8. Dataset Parsing
  9. HTML Query Examples
  10. References
  11. Similar Work

Html Query Syntax

Html Query is a self description mechanism that uses [1]JSON to describe the contents of a Html document. Although Html Query can be used with a vocabulary such as [2]microformats, HTML query does not require the author to change the html of a document in any way, an author can just describe what already exists on a page without adding any extra attributes or elements to accommodate your intended semantics.

The following is an example of a simple Html Query.

{
	"select":  {
		"from": "http://example.com/",
		"prefix": {
			"dc": "http://purl.org/dc/elements/1.1/"
		},
		"where": {
			"title": {  "label": "dc:title" }
		}
	}
}

When the query is performed on the following url http://example.com/,

It would result in the following output.

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
	 xmlns:dc="http://purl.org/dc/elements/1.1/" 
	 xml:base="http://example.com/">
  <rdf:Description rdf:about="http://example.com/">
    <dc:title>Example Web Page</dc:title>
  </rdf:Description>
</rdf:RDF>

Try a live example, by clicking example.json.

Html Query uses the following patterns to select keywords and set output parameters of a html document.

"object":  {
	"selector": {
		"property" : "value"
	}
}

and...

"object" : {
	"property" : "value"
}

[back to contents]

Properties of HTML Query

  1. select

    All Html Queries must begin with select.

    Example:

    {
    	"select" : {
    		...
    	}
    }
  2. from

    from is a url for the document to be queried. The value of from should be an absolute url

    Example:

    "from": "http://example.com/"

    The from property may be omitted. If the from property is omitted from a query, then the parser sets the value of from to the referring page.

  3. prefix

    prefix contains a comma separated list of vocabulary prefixes and uri's to be used in the RDF output of a query and in the query itself.

    A prefix is an abbreviation of a URI. Prefixes are used instead of using full URI's . Prefixes form the first part of a uri reference or [3]QName in RDF terms.

    Pattern:

    "prefix": {
    	"prefix": "uri",
    	...
    }

    Example:

    "prefix": {
    	"vcard": "http://www.w3.org/2006/vcard/ns#",
    	...
    }

    A default prefix for the output document may be set using the keyword "value"

    Example:

    "prefix": {
    	"value": "http://www.w3.org/2006/vcard/ns#",
    	...
    }
  4. where

    where contains a list of comma separated html keyword selectors and their output properties. where may contain nested where statements. If a where keyword does contain a nested where statement then the keyword is treated as a "root" value, else the keyword is a property. A property keyword should not contain further where statements.

    Pattern of a root keyword that contains a nested keyword:

    "where": {
    	"selector":  {
    		"property": "value",
    		"where": {
    			"selector" : {
    				"property": "value"
    			}
    		}
    	}
    }

    Pattern of a keyword that is a property:

    "where": {
    	"selector" : {
    		"property": "value"
    	}
    }

[back to contents]

Keyword Selectors.

Html Query uses four CSS like selectors to navigate keywords of a html document. Selectors are as defined below.

  1. element

    The selector is an element name.

    Example:

    "h1" is equal to <h1></h1>
  2. .class

    The selector is a class name.

    Example:

    ".example" is equal to class="example"
  3. #id

    The selector is the id of an element

    Example:

    "#example" is equal to id="example"
  4. attribute~=name

    The selector contains an attribute name.

    Example:

    "rel~=example" is equal to rel="example"

[back to contents]

Properties of Select

HTML Query Selectors contain six properties to set both input and output values.

Properties

  1. about

    A URL for what this "keyword" is about. The "about" property contains a space seperated list of HTML Id's which sets the subject of the keyword in [4]RDF terms.

    Pattern:

    "about" :  {
    	"id": "url",
    	...
    }

    Id's set by the about property are matched with HTML id's on a page, the URL value is used in the output. The about pattern allows different id's on a page to have the same URL value, or each id can have their own unique URL value.

    Example:

    "about": {
    	"fred": "http://example.com/"
    }

    Example HTML:

    <div id="fred">
    	...
    </div>

    The about property can also accept a boolean value of "false". Booleans in JSON may be unquoted strings. Setting the about property to false prevents a parser from generating an about attribute in the RDF output.

    Example:

    "about": false
  2. label

    Labels are used as both unique identifiers in a query and element names in the RDF output. A label is a "predicate" or "property" in [4]RDF terms.

    Pattern:

    "label": "property"

    Example:

    "label": "foaf:name"

    The pre-defined type uri may contain a list to space seperated labels.

    Example:

    "label": "foaf:primaryTopic foaf:maker"
  3. type

    The datatype of a keyword or the datatype of the object in [4]RDF terms. As well as any standard datatypes such as [5]XML Schema datatypes, HTML Query also supports five Pre-Defined types. If type is omitted from a keyword the parser defaults to just "text".

    Pattern:

    "type": "value"

    Example:

    "type": "http://www.w3.org/2001/XMLSchema#dateTime"

    Pre-Defined Types.

    1. text

      The content is just text or a plain literal in RDF. Text is extracted in the following order, @datetime, @content, @title if none of these HTML attributes are present the value is the node value.

      Example Output:

      <label>Text</label>
    2. uri

      A URI, or simply a URL. When stetting the keyword type to uri, the parser extracts the value in the following order, @src then @href.

      Example Output:

      <label rdf:resource="http://someurl.com/" />

      If neither @src or @href are present the value is @id converted to an absolute relative URL, this allows the author to link to other keyword items in the RDF output.

      Example Output:

      <label rdf:resource="http://someurl.com/#id" />
    3. uriplain

      The behaviour of uriplain is the same as uri, A uriplain is outputted as a plain literal, text.

      Example Output:

      <label>http://someurl.com/</label>
    4. xmlliteral

      An XMLLiteral string. An xmlliteral may contain HTML markup or special characters, if the value does contain markup the value should be converted to XHTML, all elements should use the http://www.w3.org/1999/xhtml XML namespace.

      Example Output:

      <x:label rdf:datataype="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral">
      	<p xmlns="http://www.w3.org/1999/xhtml">Some text.</p>
      </x:label>
    5. cdata

      A character data section. A cdata section may contain HTML markup or special characters.

      Example Output:

      <label><![CDATA[<p> Some text.</p>]]></label>
  4. content

    Content contains a space seperated list of HTML Id's, and sets the default content of a keyword. Id's set by the content property are matched with HTML id's on a page. The value of content is used in the RDF output.

    Pattern:

    "content" : {
    	"id": "value"
    }

    It is also possible to set a default content.

    Pattern:

    "content"  "value"
    
  5. multiple

    By default the parser ignores multiple values of the same selector, only the fist match of a selector is parsed.

    Multiple has a boolean value of "true", this causes the parser to extract multiple instances of the same selector with different values.

    Pattern:

    "multiple": true

    Example:

    "rel~=friend": {
    	"multiple": true,
    	...
    }
  6. rev

    Rev is a reverse property name. Rev can be used with any root property (a Selector that contains other keyword's) to create an extra chain before a keyword.

    Example:

    "where": {
    	".vcard": {
    		"rev": "knows",
    		"label": "Person",
    		"where": {
    			.....
    		}
    	}
    }

    The above example would result in the following RDF.

    <knows>
    	<Person rdf:about="...">
    		...
    	</Person>
    </knows>

[back to contents]

Root Selectors

Root Selectors are determined by whether or not a keyword contains further nested where statements. If a keyword does contain nested where statements the Selector is said to be a "root selector", if not then the Selector is said to be a property.

Root Selectors can also have type which resolve to a rdf:parseType. Valid types are:

Example:

"type": "resource"

The author can also use type as a rdf:Type selector, this will add a rdf:Type element to the root selector.

Example:

"type": "http://www.w3.org/2000/10/swap/pim/contact#ContactLocation"

In the absence of a root selector type and a reverse property name all properties contained in a root selector are wrapped in a blank rdf:Description element.

Example:

<vcard:adr>
  <rdf:Description>
    <vcard:locality>Albuquerque</vcard:locality>
  </rdf:Description>
</vcard:adr>

[back to contents]

Setting RDF about

In the absence of the about property HTML Query sets the RDF about attribute by selecting HTML values from the selected element or attribute in the following order, @href, @src and @id. If the value is @id then an absolute hash URI compiled from the from url and the value of @id. If a parser fails to select @href, @src or @id then the RDF about attribute is set to the value of from

Blank Nodes may be generated by setting the about property to false.

[back to contents]

Linking to a JSON Dataset

A HTML Query for a page can be linked to using the html rel value "dataset" The [6]HTML Link relation "dataset" is a short uri reference to http://weborganics.co.uk/ns/dataset ( this page ). By using rel dataset you are saying the url referenced in the href attribute of a link is a dataset for the referring page. The link to a dataset should also contan a type specifier of "application/json"

Example:

<link rel="dataset" href="http://example.com/my-dataset.json" type="application/json">

[back to contents]

Dataset Parsing

This Page supports dataset parsing available at http://weborganics.co.uk/dataset/?url=(+your url). The dataset parser supports transforming your dataset by linking to it in the head of your html document

Example:

http://weborganics.co.uk/dataset/?url=http://weborganics.co.uk/dataset/article.html

You can also parse just a dataset , the from property must be set, this is intended to be used for testing your dataset's.

Example:

http://weborganics.co.uk/dataset/?url=http://weborganics.co.uk/dataset/dataset-article.json

Bookmarklet

There is also a bookmarklet that you can drag to your favourites toolbar.

Bookmarklet: DatasetParse

[back to contents]

Examples

The following examples were created during the development of the HTML Query syntax. Please click one of the following links to view the examples.

Click the link at the bottom of each page that says "Get RDF dataset" to test.

  1. HTML Article, view json
  2. HTML hCard as FOAF, view json
  3. HTML hAtom, view json
  4. HTML hCalendar, view json
  5. HTML hProduct, view json
  6. HTML hReview, view json
  7. HTML Organization, view json
  8. HTML hCard, view json
  9. HTML hAudio, view json

The following are "real world" examples performed on live webpages.

  1. A youtube search for videos about the semantic web. view json or results
  2. ...

[back to contents]

References

  1. JSON specification. http://www.json.org/
  2. Microformats. http://microformats.org/
  3. QName. http://en.wikipedia.org/wiki/QName
  4. RDF Concepts and Abstract Syntax. http://www.w3.org/TR/rdf-concepts/
  5. XML Schema Part 2: Datatype's Second Edition. http://www.w3.org/TR/xmlschema-2/
  6. Hypertext Links in HTML. http://www.w3.org/TR/WD-htmllink-970328#link

[back to contents]

Similar Work

  1. JSON Schema

[back to contents]

WebOrganics 2010.

Semantic Web