Business, Trading, yourgames: Crossref, OpenURL and more Linked Data Heresy

Monday, July 6, 2009

Crossref, OpenURL and more Linked Data Heresy

After CrossRef was started nine years ago, I quipped that it was nothing short of miraculous, since it was the first time in recorded history that so many publishers had gotten together and agreed on something that they would have to pay for. I'm sure that was an exageration, but my point was that CrossRef was not really about linking technology, rather, it was about the establishment of a business process around linking technology. The choice of technology itself was to some extent irrelevant.

Last week, in a comment on my post about AdaptiveBlue and OpenURL, Owen Stephens raised some interesting questions surrounding OpenURL, DOI (Digital Object Identifier), and Linked Data. It's useful to think of each of these as a social practice surrounding a linking technology; I'll describe each of them in turn.

DOI is often thought of as synonymous with CrossRef, which is incorrect. DOI is a link indirection technology used by the CrossRef organization. There are some DOIs that are not CrossRef DOIs, but most of the

DOIs you are likely to come across will be CrossRef DOIs. CrossRef provides registration, matching and lookup services in addition to the DOI redirection service, and from here on, I'll be talking about CrossRef DOIs only. The core mission of Crossref is the transformation of journal article citations into clickable URLs. CrossRef has registered about 35 million DOIs, most of them for journal articles. In the registration process, CrossRef collects identifying metadata for the journal articles, which it then uses to power its matching and lookup services. The matching service is currently making about 15 million matches per month.

CrossRef is far from being perfect, but its achievements have been considerable. Most scholarly journal publishers have integrated the CrossRef registration and matching process into their

production workflows. The result is that many thousands of electronic journals today are being linked to from many thousands of other electronic journals, databases, search engines, even blogs.

In contrast to CrossRef, which is focuses on publishers and publisher workflow integration, OpenURL is a linking technology and practice that has focused on helping libraries manage links to and from the electronic resources available to their patrons. OpenURL is complementary to Crossref- OpenURL linking agents usually make use of CrossRef services to accomplish their mission of helping users select the appropriate resources for a given link. Libraries frequently need to deal with problems associated with multiple resolution- a given article might be available at ten or even a hundred different URLs, only one of which might work for a given library patron.

Finally, Linked Data is an emerging practice which enables diverse data sets to be published, consumed and then linked with other data sets and relinked into a global web of connections. It would be interesting to find out how many matches are being made in the Linked Data web to compare with CrossRef, but because of the decentralized matching, its not really possible to know. While CrossRef and OpenURL focuses on connecting citing articles and abstracts with the cited articles, Linked Data attempts to support any type of logical link.

Obviously there is overlap between Linked Data and the more established linking practices. Can (and should) Linked Data applications reuse the CrossRef and/or OpenURL URI's? Let's first consider OpenURL. OpenURL is really a mechanism for packaging metadata for a citation (jargon: ContextObject) into a URI. So the "thing" that an OpenURL URI identifies is the set of services about the citation available from a particular resolver agent. That's not usually the thing that you want to talk about in a Linked Data Application.

What about CrossRef DOIs? There are two different URI's that you can make with a DOI. There's the http URL that gets redirected to full text (you hope) by the DOI gateway: http://dx.doi.org/10.1144/0016-76492006-123 There's also the "info-uri" form of the doi- info:doi/10.1144/0016-76492006-123 , which you can't click on. It's clear what the latter URI identifies- it's a 2007 article in the Journal of the Geological Society. Many libraries run resolver agents that can turn that URI into clicakable service links. I'm not sure what the former URI identifies. What the URI gets you to is a web page with links to two different instantiations of the article identified by the info-uri. Apparently it doesn't identify the same article in its other instantiations on the internet. So the most correct URI to use, if you want to make Linked Data assertions about the article, is (in my humble but correct opinion) to use the info-uri.

There's one little problem.

The second of Tim Berners-Lee's "Four Rules" for Linked Data is "Use HTTP URIs so that people can look up those names." But CrossRef, a stable, self-sustaining organization which has made huge strides moving the world of journal publishing to a more open, more usable, more linked environment, provides look-up APIs that return high quality XML metadata so that you can look up the names that it defines. It has a solid record of accomplishing exactly the things that Linked Data is trying to do, albeit with broader scope, but undeniably with significant impact. The identifier that CrossRef is using is the DOI, and the URI form of DOI is NOT an HTTP URI.

Maybe Tim BL's second rule is wrong, too!

Article any source

Business, Trading, yourgames

Monday, July 6, 2009

Crossref, OpenURL and more Linked Data Heresy

No comments:

Post a Comment

Blog Archive