Wednesday, April 29, 2009

RDF and Twitter: Compare and Contrast

As I wrote previously, RDF was developed with the idea that it would be the backbone of something called the "semantic web", which was supposed to be different from the world-wide web in that machines would be able to transmit and "understand" information from global network. In contrast, Twitter was developed with the idea that people would need to document their sad pathetic lives in 240 character chunks. On this date, however, the Twitterverse seems to be an intelligent global network that can transmit and understand almost anything, and the RDF-based semantic web seems to still be convinced of a need for agents to transmit dribbles of sad, pathetic knowledge in an endless stream of subject-object-predicate triples.

It's interesting to compare the core data models for RDF and for Twitter. In RDF, the fundamental particles are, as I've said, subject-object-predicate triples. To recast that last sentence into the RDF model, we would proceed as follows:
 Assertion:
subject: RDF
object: subject-object-predicate triples
predicate: has fundamental particles of type
That's probably too self-referential for most people to wrap their heads around, so instead I'll change the example:
 Assertion:
subject: The United States
object: Barack Obama
predicate: has a president named
I usually have trouble remembering which is the predicate and which is the object. If you think about it, however, you can express the same particle of knowledge in ways that swap the roles of predicate and object, or even subject and predicate. For example:
 Assertion:
subject: Barack Obama
object: President of the United States
predicate: has the office of
In your copious spare time, you can work out the other 4 permutations.

Now let's look at Twitter. The particle of information in Twitter, the tweet, seems also to be a triple:
 Tweet:
tweeter: gluejar
message: going to bed now!
time: Wed, 29 Apr 2009 06:58:01 +0000
The tweeter in turn has associated with it sets of followed users and followers as well as profile information. There's a lot to talk about here, and in a previous post I pointed out that Twitter message content is becoming richer and more linguistically complex. But the point I'd like to make for now is that twitter's point of view is that it doesn't care so much about what the message is saying as who is saying it and when it was said. The more we look at the RDF examples above, the more the subject-object-predicate representation of knowledge seems limiting. The assertion may be true or false depending on when it was said; assertions removed from the context of who is making the assertion are for the most part useless because machines have no way to know whether to trust the assertion.

Friend-of-the-blog Jeff Young asserts that the OpenURL data model can be thought of as answering 6 questions: Who, What, Where, When, Why and How. Whatever success Twitter has achieved can be thought of as an argument that the most important of these are the Who, What and When.

Sanity Alert! the following may be mind-blowing to certain susceptible individuals: the data model that Twitter REALLY uses to propagate tweets is RSS and Atom. These formats are decended from what was originally called "Meta Content Format" which became "RDF Site Summary" (Yes, the very same RDF!) which became "Really Simple Syndication" or maybe something else, I'm not sure for sure. Here's how Twitter REALLY feeds into the semantic web:
  tweet:
title: gluejar: going to bed now!
description: gluejar: going to bed now!
pubDate: Wed, 29 Apr 2009 06:58:01 +0000
guid: http://twitter.com/gluejar/statuses/1649740567
link: http://twitter.com/gluejar/statuses/1649740567

Exercise for the reader- how does this look in Atom?

Does anyone but me think that there's something weird going on here?

Article any source

No comments:

Post a Comment