raven ioctl

A discussion of the design and implementation of an RDF API.

ioctl.org : RDF : discussion : collections

bag, sequence, alternation api

I have misgivings about both the bag and alt constructs that RDF specifies. alt, because the semantics of this are very loosely specified: it should have a precise interpretation (if one is possible that satisfies all domains) or not be there at all.

The bag, however, is a different beast. A bag is like a set that can contain literals and resources. However, unlike a set, a bag can contain multiple instances of (or rather, references to) the same resource and/or literal. Were it a set, the multiple-instance consideration would not arise.

The reason why I consider this to be a bit strange is as follows: firstly, a resource's identity is precisely specified by its URI. (Even 'anonymous' resources, as I understand them, can be seen as having an identity with respect to the model in which they appear; this can be thought of as a private URI which is never exposed to the world.) A literal's identity is precisely specified by the contents of that literal (UTF-8 vs. ASCII, etc. representation issues aside).

Consequently, I find it confusing to say that "resource X occurs in bag B three times". What does it mean for an identical thing to appear three times? Is that even meaningful?

There are obviously situations where we may need to have reference to the same resource multiple times (perhaps someone who is both an author and editor of a book appears twice in the capacity of 'contributor'). However, looking more closely at this example, it is evident to me that a set would more than suffice. Why? Well, are we solely interested in whether Joe Bloggs is a 'contributor'? If so, he either is, or he isn't. Set membership works like that. Are we interested, instead, in the number of times or the different roles that Joe Bloggs contributed? In which case, surely the collection we're talking about is a set of events or roles which carry the extra information required to distinguish 'Joe Bloggs as author' from 'Joe Bloggs as editor'. That's why I think bag should have been set.

However, misgivings aside, I'm trying to address what we need in an RDF API that handles the current specification, not to fix what I perceive as (minor) problems. Now that's off my chest, I promise not to complain about bags (not sets) again.

cannot specify all semantics, but...

Implementations must make explicit their meaning of all these.

modelling linked-list data structures in RDF

I used to think this was a good idea; that (for example) a better way to model a sequence would be through the use of 'placeholder' resources of type 'sequence-item' connected into a structure through first, last, and next links in the RDF model.

I'm not so convinced now that this is appropriate in all cases. Again, I was suffering from the blurring of the distinction between the RDF model, any underlying implementation, and the API i was envisioning to access these sequences. I think the _1, _2 mechanism that RDF specifies suffices as a presentation, providing that a higher-level API than just arc-traversal exists for dealing with sequences.

This is not to preclude the adoption of any other convenient RDF mechanisms for specifying sequences.

desirable higher-level operations

Now that the preamble is over, we can list the kinds of operations that we expect to be able to perform. The operations given here form an ideal.

Permit: generation of _1, etc. properties automatically. Also, pulling the value of n out of _n. model.countMembers(bag); model.appendMember(bag, item); model.insertMember(bag, item); model.insertMemberAt(bag, item, position); NodeIterator model.enumerateMembers(bag); model.deleteMemberAt(bag, position); model.deleteMember(bag, item); model.deleteMemberAll(item).

Asserting A-[_n]->B should be equivalent to model.insertMemberAt(a, b, n); similarly for retraction. This can be implemented as an example layer that relies on the explicit collection API.

Renumbering should be automatic (following the model spec)

Note about: more complex semantics impled by model spec than by syntax spec

Note about: realistic implications of desirable properties Eg. In BDB, we need two list tables, collection->members and member->collections (with ref counts) which allows us to do stuff like aboutEach processing (although re-reading, aboutEach should be a parser function, not a model one).