Taking "Literal" literally
A discussion on #rdfig about the options I listed for representing test cases in manifest files got me thinking about what literals are (from the RDF point of view). I have what I hope is a consistent answer to the following questions: what is a Literal?, is a literal a resource, what does http://www.w3.org/ denote? and most importantly, so what do you suggest we do about it?
I'll give my answer first: a literal is a self-denoting entity. (DanBri points out this might be a confusing phrase, no matter how cute)
Now that we have a model theory for RDF, we have the notion of an interpretation (or denotation) for an RDF graph; in particular, a interpretation function which tells us, for instance, that when we see the resource with the URI label urn:person:jan-grant, we're talking about the person. In other words,
What I'm proposing is that a literal is the direct embedding of a value into an RDF graph. It really is (honestly) literally there in the graph. In other words, what makes lit a literal is that
...for any interpretation, I. (In other words, that RDF interpretations ought to preserve the literalness of literals).
The most important thing is that the answer is not, "a literal is a unicode string with a language tag...". A literal can be pretty much any typed datum. Examples of literals are:
Hang on, what's a URI doing in there? Isn't a URI a resource? No, and that's something which we all recognise: a URI is just used to label a resource. We use that URI as a convenient tag when we want to describe its associated resource; but sometimes, we want to talk about the URI itself too.
"Everything is a resource" is something you'll hear quite a bit, particularly on #rdfig. I'm not sure this is right. I'd say, there are two kinds of things in the RDF universe: literals and resources. These are both entities.
Yes, there's an "R" in RDF, not an "E". But we use RDF to describe resources because we usually have more interesting things to say about them. I'm saying that a "fixed" RDFS should put rdfs:Resource and rdfs:Literal as sibling subclasses of rdfs:Entity, not one inside the other.
So how do we decide if something is a literal or a resouce? Taste, experience, practice; also, it's pretty hard to wrap up an organisation and shove it into an RDF graph.
Jocelyn Paine articulated the difference between objects and indiscernables in his paper, "Everything is not an object". This is probably worth a read. Another test one could use is the notion of modality; are all possible instances of the entity I'm considering essentially identical?
[aside] It's worth pointing out that I've been slightly underhand about my definition of what comprises a literal; if my criterion is "you can slap it into a graph" then what is Date doing above? The answer is that you pronounce Date, "Date descriptor". It's a mathematical description of a date, 'tis all.
[two words of explanation] This is only a slightly related issue, but seems to be one of the things that motivate people to claim "a literal is a resource". Secondly, below I use the term "converse". Having asked around I'm given to understand that turning an arc around gives you the "inverse" property; I'd have thought an inverse of P, say P' was such that (A P' B) held iff (A P B) didn't hold. Anyroadup. The answer to the question really is...
Yes. We can't serialise it, but there's nothing stopping us from doing it.
One argument in favour of this: for every literal-valued property hasX, we can conceive without pain of a converse property, isXof. (eg: Jan has age 21; 21 is the age of Jan.)
An additional argument is laziness: permitting this sort of construct makes definitions of things like rdfs:domain and rdfs:range symmetrical, and means we don't have to keep on special-casing literals when we reason about RDF; we can reason about entities.
Two obvious question follow immediately:
For the implications of this, you'd have to ask Pat Hayes. I've no idea. I would point out that regularity is only a virtue as long as it makes sense.
These are completely permissible; in most obvious instances, they carry zero information since the relationship between two literals is fixed and (in some sense) self-evident. Example: number(12) is always greater than number(4).
However, it's also perfectly possible to invent properties that are non-obvious and that do carry information, for example number(12) is a number that Jan finds preferable to number(4). This simply expresses my numerological inclinations.
It's not worth saying anything more about this matter.
Does it denote an organisation? A home page? A whole web site? (There may be a "standard practice" that I'm not aware of, eg, a convention that says that resources with legal HTTP URLs denote the web pages that they point to; if so, that needs writing down somewhere and pointing to more, because I couldn't find it.)
The answer to this, though, is probably made most clear with a picture.
_:org <rdf:type> <foo:Organisation> . _:org <foo:hasHomePage> _:hp . _:hp <rdf:type> <foo:WebPage> . _:hp <foo:address> uri(http://www.w3.org/) .
There is an optional arc
_:org <foo:hasHomePageAddress> uri(http://www.w3.org/) .
in this picture too. Once the relationships are clear, the URIs that you use to label the nodes _:org and _:hp (if any) can be chosen.
RDF/XML is a bit broken at the moment (or rather, it cannot express "literals on the blunt end" or reexpress the same anonymous node - but that's a different issue) but there is a simple fix possible. The fix lets us specify typed literals properly and documents that make use of it are backwards-semi-comprehensible by existing parsers.
We can retain backwards compatibility by having literals default to langstring() or unicode() literals when rdf:parseType is missing or has the value "Literal".
In this case, the literals should behave according to Jeremy and Bill's proposal (@@pointer?)
To express literals of other types, we can stick the type in the rdf:parseType attribute, for instance:
<rdf:Description> <foo:value>foo</foo:value> </rdf:Description>
<rdf:Description> <foo:value xml:lang="fr">foo</foo:value> </rdf:Description>
<rdf:Description> <foo:value rdf:parseType="xsd:Date">2001-09-01</foo:value> </rdf:Description>
<rdf:Description> <foo:value rdf:parseType="bar:Number">12</foo:value> </rdf:Description>
<rdf:Description> <foo:value rdf:parseType="xsd:URI">http://www.w3.org/</foo:value> </rdf:Description>
@@stick proper XSD datatype names in here where possible
Here we note that, while some literals are distinct from their unicode representation, we wind up using a unicode representation to serialise them. That's unavoidable if we want to ship them around; I'm just trying to distinguish a literal's value from its representation in a serialised form.