Saturday, September 5, 2009

RDF Properties on Magic Shelves

Book authors and politicians who go on talk shows, whether it's the Daily Show, Charlie Rose, Fresh Air, Oprah, Letterman, whatever, seem to preface almost every answer with the phrase "That's a really good question, (Jon|Teri|Stephen|Conan)". The Guest never says why it's a good question because real meaning of that phrase is "Thanks for letting me hit one out of the ballpark." Talk shows have so little in common with baseball games or even tennis matches. On the rare occasion when a guest doesn't adhere to form, the video goes viral.

I've been promising to come back to my discussion of Martha Yee's questions on putting bibibliographic data on the semantic web. Karen Coyle has managed to discuss all of them at least a little bit, so I'm picking and choosing just the ones that interest me. In this post, I want to talk about Martha's question #11:
Can a property have a property in RDF?
The rest of my post is divided into two parts. First, I will answer the question, then in the second part, I will discuss some of the reasons that it's a really good question.

Yes, a property can have a property in RDF. In the W3C Recommentation entitled RDF Semantics, it states: "RDF does not impose any logical restrictions on the domains and ranges of properties; in particular, a property may be applied to itself." So not only can a property have a property in RDF, it can even use itself as a property!

OK, that's done with. Not only is the answer yes, but it's yes almost to the point of absurdity. Why would you ever want a property to be applied to itself? How can a hasColor property have a hasColor property? If you read and enjoyed Gödel, Escher, Bach, you're probably thinking that the only use for such a construct is to define a self-referential demonstration of Gödel's Incompleteness Theorem. But there actually are uses for properties which can be applied to themselves. For example, if you want to use RDF properties to define a schema, you probably want to have a "documentation" property, and certainly the documentation property should have its own documentation.

If you're starting to feel queasy about properties having properties, then you're starting to understand why Yee question 11 is a good one. Just when you think you understand the RDF model as being blobby entities connected by arcs, you find out that the arcs can have arcs. Our next question to consider is whether properties that have properties accomplish what someone with a library metadata background intends them to accomplish, and even if they do so, is it the right way to accomplish it?

In my previous post on the Yee questions, I pointed out that ontology development is a sort of programming. One of most confusing concepts that beginning programmers have to burn into their brains is the difference between a class and an class instance. In the library world, there are some very similar concepts that have been folded up into a neat hierarchy in the FRBR model. Librarians are familiar with expressions of works that can be instantiated in multiple manifestations, each of which can be instantiated in multiple items. Each layer of this model is an example of the class/instance relationship that is so important for programmers to understand. This sort of thinking needs to be applied to our property-of-a-property question. Are we trying to apply an property to an instance of a property, or do we want to apply properties to property "classes"?

Here we need to start looking at examples, or else we will get hopelessly lost in abstraction-land. Martha's first example is a model where the dateOfPublication is a property of a publishedBy relationship. In this case, what we really want is a property instance from the class of publishedBy properties that we modify with a dateOfPublication property. Remember, there is a URI associated with the property piece of any RDF triple. If we were to simply hang a dateOfPublication on a globally defined publishedBy we would have made that modification for every item in our database using the publishedBy attribute. That's not what we want. Instead, for each publishedBy relation we wanted to assert, we need to create a new property, with a new URI, related to publishedBy using the RDF Schema property subPropertyOf.

Let's look at Martha's other example. She wants to attach a type to her variantTitle property to denote spine title, key title, etc. In this case, what we want to do is create global properties that retain variantTitleness while making the meaning of the metadata more specific. Ideally, we would create all our variant title properties ahead of time in our schema or ontology. As new cataloguing data entered our knowledgebase, our RDF reasoning machine would use that schema to infer that spineTitle is a variantTitle so that a search on variantTitle would automatically pick up the spineTitles.

Is making new properties by adding a property to a subproperty the right way to do things? In the second example, I would say yes. The new properties composed from other properties make the model more powerful, and allow the data expression to be simpler. In the first example, where a new property is composed for every assertion, I would say no. A better approach might be to make the publication event a subject entity with properties including dateOfPublication, publishedBy, publishedWhat, etc. The resulting model is simpler, flatter, and more clearly separates the model from the data.

We can contrast the RDF approach of allowing new properties to be created and modified by other properties to that of MARC. MARC makes you to put data in fields and subfields and subfields with modifiers, but the effect is sort of like having lots of dividers on lots shelves on a bookcase- there's one place for each and every bit of data- unless there's no place. RDF is more like a magic shelf that allows things to be in several places at once and can expand to hold any number of things you want to put there.

"Thanks for having me, Martha, it's been a real pleasure."
Reblog this post [with Zemanta]

Article any source

No comments:

Post a Comment