raven ioctl

A lightweight record-retrieval mechanism using RDF.

ioctl.org : RDF : lightweight record-retrieval

a lightweight query mechanism

The intention in transmitting queries to remote repositories is to locate metadata records that satisfy some set of criteria. In response, we expect to receive a set of metadata records that satisfy (or nearly satisfy, if the remote repository supports some kind of fuzzy-matching or near-miss mechanism) those criteria.

Although this mechanism can be used to return general subgraphs, the primary motivation for this is to have a simple scheme for implementing the EASEL project. Consequently, we talk about distinct "metadata records" which is possibly a bit misleading. I'm bearing it in mind.

Query representation

Queries are represented in RDF as they transit between the Search Gateway and the remote repository. Structurally, a query resembles the metadata record(s) that it is trying to locate.

At each node of the tree (or graph) where we expect a result to 'fill in' values from a metadata record, we use an 'anonymous resource' (a node with no permanent URI associated with it) to act as a placeholder or wildcard.

We further decorate nodes in this structure with additional constraint arcs, the values of which are RDF structures which express the constraints we wish to apply in satisfying our query.

The mechanism for this is to define a general 'query:constraint' property; from this we specialise subproperties representing the various types of constraints that we wish to be able to express.

It is convenient to divide the possible constraints into two broad categories: value constraints, and structural constraints.

A value constraint is a typical condition that a scalar value is required to meet: has-substring, between-dates, and so on. To this category it is also possible to add constraints which are approaching structural in nature, for instance, that an RDF resource should belong to a particular category in some taxonomy.

Structural constraints represent more complex requirements that may stretch in scope to cover more than one node in the graph representation of the metadata being searched. One example might be a path-expression in the style of XPath. Perhaps a more useful example might be a constraint which expresses the search for particular keywords. Such a constraint does not necessarily have a simple boolean interpretation; instead, it might permit the remote repository to rank results depending on some measure of 'closeness' between the set of required keywords and those present in a particular metadata element.

The types of nodes in the template tree are specified using 'rdf:type' arcs.

The root element of this (tree-like?) template may be distinguished by an additional type-arc with a value of 'query:rootnode'. This simply serves to define the principle type of metadata records we are searching for.

As part of our query schema, we add an additional type, that of 'query:literal', which indicates that a placeholder resource is actually intended to represent a literal value. This permits the expression of further constraints upon the 'literal placeholder' and circumvents the restriction that RDF places on literal values (that they may not directly have properties).

Result representation

The representation of a query result is simply the union of all subgraphs that match the conditions expressed by the query.

The nodes in the result graph that correspond to instantiations of the root node of the query are collected and presented in a sequence that, again, is distinguished by having a type of 'query:result'.

Development and future possibilities

The scheme currently outlined has one significant drawback: that is, it is incapable of representing searches for metadata records that themselves describe queries expressed using this scheme.

RDF has a mechanism (reification) for overcoming this. The fix, while trivial, would both complicate the representation of normal queries and make examples much harder to understand. While we note this possilibity, therefore, we refrain from specifying that queries should make full use of reification until further experience with this technology has been gained.

The query representation also, on first appearance, lacks any capability for transformation or aggregation of results.

Again, there is a potential solution to this possible via the definition of more complex structural constraints. This may serve to extend the useful scope of this lightweight query mechanism beyond the fetching of metadata records.

However, the query mechanism outlined here is not a generalised query language; for instance, it does not support the arbitrary transformation of selected subgraphs.

I'm thinking, by the way, that this style of mechanism (together with the reification hack) will form half of such a generalised language; select what we're after first, then reexpress it.

It is envisaged that, should the remote repository support the ranking of metadata records based on apparent relevance, the distinguished sequence of results will appear in decreasing order of relevance.

This query mechanism is capable of representing queries for schema elements expressed using the Resource Description Framework (RDF) Schema Specification 1.0.

An example semantics

The intended semantics of LWRR (ie, conjunction with disjunction over representational facets of the same resoruce) are demonstrated by a compiler from LWRR to prolog.