Wednesday, July 27, 2011

Liking Library Data

If you had told me ten years ago that teenagers would be spending free time "curating their social graphs", I would have looked at you kinda funny. Of course, ten years ago, they were learning about metadata from Pokemon cards, so maybe I should have seen it coming.

Social networking websites have made us all aware of the value of modeling aspects of our daily lives in graph databases, even if we don't realize that's what we're doing. Since the "semantic web" is predicated on the idea that ALL knowledge can be usefully represented as a giant, global graph, it's perhaps not so surprising that the most familiar, and most widely implemented application of semantic web technologies has been Facebook's "Like" button.

When you click a Like button, an arc is added to Facebook's representation of your social graph. The arc links a node that represents you and another node that represents the thing you liked. As you interact with your social graph via Facebook, the added Like arc may introduce new interactions.

Google must think this is really important. They want you to start clicking "+1" buttons, which presumably will help them deliver better search. (You can try following me+, but I'm not sure what I'll do with it.)

The technology that Facebook has favored for building new objects to but in the social graph is derived from RDFa, which adds structured data into ordinary web pages. It's quite similar to "microdata", a competing technology that was recently endorsed by Google, Microsoft, and Yahoo. Facebook's vocabulary for the things it's interested in is called Open Graph Protocol (OGP), which could be considered a competitor for Schema.org.

My previous post described how a library might use microdata to help users of search engines find things in the library. While I think that eventually this will be an necessity for every library offering digital services, the are a bunch of caveats that limit the short-term utility of doing so. Some of these were neatly described in a post by Ed Chamberlain:
  • the library website needs to implement a site-map that search engine's crawlers can use to find all the items in the Library's catalog
  • the library's catalog needs to be efficient enough to not be burdened by the crawlers. Many library catalog systems are disgracefully inefficient.
  • the library's catalog needs to support persistent URLs. (Most systems do this, but it was only ten years ago that I caused Harvard's catalog to crash by trying to get it to persist links. Sorry.)
But the clincher is that web search engines are still suspicious of metadata. Spammers are constantly trying to deceive search engines. So search engines have white-lists, and unless your website is on the white-list, the search engines won't trust your structured metadata. The data might be of great use to a specialized crawler designed to aggregate metadata from libraries, but there's a chicken and egg problem: these crawlers won't be built before libraries start publishing their data.

Facebook's OGP may have more immediate benefits. Libraries are inextricably linked to their communities; what is a community if not a web of relationships? Libraries are uniquely positioned to insert books into real world social networks. A phrase I heard at ALA was "Libraries are about connections, not collections".

Libraries don't need to implement OGP to put a like button on a web page, but without OGP Facebook would understand the "Like" to be about the web page, rather than about the book or other library item.

To show what OGP might look like on a library catalog page, using the same example I used in my post on "spoonfeeding library data to search engines":
<html> 
<head>
<title>Avatar (Mysteries of Septagram, #2)</title>
</head>
<body>
<h1>Avatar (Mysteries of Septagram, #2)</h1>
<span>Author: Paul Bryers (born 1945)</span>
<span>Science fiction</span>
<img src="http://coverart.oclc.org/ImageWebSvc/oclc/+-+703315758_140.jpg">
</div>

Open Graph Protocol wants the web page to be the digital surrogate for the thing to be inserted into the social graph, and so it wants to see metadata about the thing in the web page's meta tags. Most library catalog systems already put metadata in metatags, so this part shouldn't be horribly impossible.
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:og="http://ogp.me/ns#"
xmlns:fb="http://www.facebook.com/2008/fbml">
<head>
<title>Avatar (Mysteries of Septagram, #2)</title>
<meta property="og:title" content="Avatar - Mysteries of Septagram #2"/>
<meta property="og:type" content="book"/>
<meta property="og:isbn" content="9780340930762"/>
<meta property="og:url"
content="http://library.example.edu/isbn/9780340930762"/>
<meta property="og:image"
content="http://coverart.oclc.org/ImageWebSvc/oclc/+-+703315758_140.jpg"/>
<meta property="og:site_name" content="Example Library"/>
<meta property="fb:admins" content="USER_ID"/>
</head>
<body>
<h1>Avatar (Mysteries of Septagram, #2)</h1>
<span>Author: Paul Bryers (born 1945)</span>
<span>Science fiction</span>
<img src="http://coverart.oclc.org/ImageWebSvc/oclc/+-+703315758_140.jpg">
</div>

The first thing that OGP does is to call out xml namespaces- one for xhtml, a second for Open Graph Protocol, and a third for some specific-to-Facebook properties. A brief look at OGP reveals that it's even more bare bones than schema.org; you can't even express the fact that "Paul Bryers" is the author of "Avatar".

This is less of an issue than you might imagine, because OGP uses a syntax that's a subset of RDFa, so you can add namespaces and structured data to your heart's desire, though Facebook will probably ignore it.
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:og="http://ogp.me/ns#"
xmlns:fb="http://www.facebook.com/2008/fbml"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
<head>
<title>Avatar (Mysteries of Septagram, #2)</title>
<meta property="og:title"
content="Avatar - Mysteries of Septagram #2"/>
<meta property="og:type"
content="book"/>
<meta property="og:isbn"
content="9780340930762"/>
<meta property="og:url"
content="http://library.example.edu/isbn/9780340930762"/>
<meta property="og:image"
content="http://coverart.oclc.org/ImageWebSvc/oclc/+-+703315758_140.jpg"/>
<meta property="og:site_name"
content="Example Library"/>
<meta property="fb:app_id"
content="183518461711560"/>
</head>
<body>
<h1>Avatar (Mysteries of Septagram, #2)</h1>
<span rel="dc:creator">Author:
<span typeof="foaf:Person"
property="foaf:name">Paul Bryers
</span> (born 1945)
</span>
<span rel="dc:subject">Science fiction</span>
<img src="http://coverart.oclc.org/ImageWebSvc/oclc/+-+703315758_140.jpg">
</div>

The next step is to add the actual like button by embedding a javascript from Facebook:
<div id="fb-root"></div>
<script src="http://connect.facebook.net/en_US/all.js#appId=183518461711560&xfbml=1"></script>
<fb:like href="http://library.example.edu/isbn/9780340930762/"
send="false" width="450" show_faces="false" font=""></fb:like>

The "og:url" property tells facebook the "canonical" url for this page- the url that Facebook should scrape the metadata from.

Now here's a big problem. Once you put the like button javascript on a web page, Facebook can track all the users that visit that page. This goes against the traditional privacy expectations that users have of libraries. In some jurisdictions, it may even be against the law for a public library to allow a third party to track users in this way. I expect it shouldn't be hard to modify the implementation so that the script is executed only if the user clicks the "Like" button, but I've not been able to find a case anyone has done this.

It seems to me that injecting library resources into social networks is important. The libraries and the social networks that figure out how to do that will enrich our communities and the great global graph that is humanity.
Enhanced by Zemanta

Article any source

No comments:

Post a Comment