Dewey Summaries as Linked Data
For a long time, the Dewey team at OCLC has wanted to do something with Linked Data. That is, apply Linked Data principles to parts of the Dewey Decimal Classification and present the data as a small “terminology service.” The service should respond to regular HTTP requests with either a machine- or a human-readable presentation of Dewey classes. There should be a URI (and, even better, a Web page that delivers a useful description) for every Dewey concept, not just single classes. The data should be presented in a format that is capable of handling rich semantic information and in a way that allows users or user agents to just “follow their nose” to explore the data. For more complex stuff, the service should offer an API-like query access. Finally, the data that is presented should be reusable by anyone for non-commercial purposes.
Along comes dewey.info
Tim Berners-Lee's new-ish Linked Data meme “Raw Data Now!” perhaps makes it look a little too easy to actually publish interoperable data for the semantic Web (of which—in case you were wondering—Linked (Open) Data is a subset, using a cross-section of tools from the infamous semantic Web layer cake). One reason for this could be that there isn't much of our specific type of data around, i.e., large multilingual universal classification systems. With this kind of semantically rich data there is a constant tension of what you are actually after: the capital “O” world of strict Ontological modeling or the small “o” world of published data sets that are only slightly semantically enhanced (with a little bit of ontology sprinkled here and there), but with applicability across many possible domains.
For now, the latter approach seemed a more effective driving force that prompted us to confront several different issues that are relevant for either one. We had to come up with a URI pattern for the DDC that would function as persistent identifiers for DDC concepts in a distributed environment. Secondly, we wanted to test out the RDF vocabulary SKOS for creating a representational model to express some of the best nuggets of DDC data (language-independent identifiers, multilingual terminology, and semantic relationships). And finally, because Linked Open Data is not really open when you have to ask someone before you can use it, we wanted to test out a Creative Commons license for easier reuse of DDC data for non-commercial purposes.
To test out if and how some of these goals can be achieved, we chose the Dewey Summaries as a suitable data set to publish according to Linked Data principles. The latest version of the Summaries, i.e., the top 1110 classes of DDC 22, has been available as a Web document for some time. To broaden the possible applications of what now essentially is just tag soup (in only one language), every class had to be identified with a URI and the data had to be presented in a reusable way.
So how does it work?
Did you ever run into a Dewey number, say 641, and want to know (or let your users know) what this number stood for? Now you can use a regular browser and use the following URL: http://dewey.info/class/641/. This URL as an identifier stands for class “641” in the DDC and redirects a regular Web browser automatically to HTML representations of all available versions of this class in all available languages ( http://dewey.info/class/641/about). The “/about” part indicates that this URL stands for a general description of the abstract concept (i.e., Dewey class 641), not the concept itself. The concept itself—as an abstract thing or idea—does not have a representation that can be sent over the Web, so the Web server points the user agent to a place on the Web where a description of that thing can be found.
The specific format of this description is negotiated in the background by the user agent and the server. A regular Web browser like Opera or Firefox is delivered an HTML version of the page, that is also available directly at http://dewey.info/class/641/about.html. A Linked Data browser like Zitgist would be presented with a RDF (Resource Description Framework) version of the data that it uses to construct its own view.
One of the main benefits of having a language-independent representation of a subject, say, a Dewey number, is that it is very easy to switch between languages when it comes to displaying the language-dependent part like category descriptions or other associated terminology. By appending a language tag to the URI of the generic resource (ending in “/about”) you can narrow down on versions in a specific language: http://dewey.info/class/641/about.fr. (The HTML view of a single class for which other languages are available also shows links to these versions.) The ability to bypass content negotiation by specifying the desired format directly is still there: http://dewey.info/class/641/about.fr.rdf.
Finally, the service offers the possibility of specifying the date of the version that should be identified or retrieved. The utility of this feature will become more apparent as updates are added to the service. By specifying a year and/or month in the URI ( http://dewey.info/class/641/2009/08/) the service will only show concepts from that period of time, in this case, August 2009. When combining all those elements, you arrive at a pretty complete description of a Dewey class: http://dewey.info/class/641/2009/08/about.ar.html. (The original plan for Dewey URIs calls for more precision in specifying the “timeslice” of a version, down to minutes and seconds. This should be part of a future release.)
Some other features that add value to the service are a little to technical to describe here in full, but they should at least be mentioned in passing. The HTML view is in fact already semantically enriched using a W3C standard called RDFa. Using a browser that is aware of RDFa (or an RDFa extractor) opens up new possibilities of harvesting, collecting and connecting Dewey data.
Secondly, dewey.info exposes a simple API by using SPARQL, the standard search technology for the semantic Web. The adventurous might be interested in this example query that retrieves the main classes of Dewey in French.
What is it good for?
The main purpose of dewey.info is to contribute to the growing web of Linked Data, so most use cases that apply to Linked Data also apply to dewey.info. Looking at the way Dewey Summary data is used in the World Digital Library, something comparable could be accomplished without the added complications of acquiring and massaging the data before it was available on the Web at dewey.info.
And if there are already Dewey numbers in your metadata, you might want to consider constructing dewey.info URIs and adding those in addition to the plain Dewey numbers that may already be there. You would be able to immediately take advantage of all nine languages that are available in the store at the moment, In addition, you would benefit from other languages and other updates to the data that will be added in the future. The numbers will become alive and start to speak, literally, allowing you to make a fuller use of Dewey in general, as useful and descriptive data is just a hyperlink away. At the same time, by specifying a date of assignment in the URI, you can reliably and persistently reference a specific representation, pinpointing version, language, and content format, even if this Dewey number was updated and has changed its meaning in the meantime.
What's down the road?
What you see now is only the first step. The intention is for dewey.info is to be a platform for Dewey data on the Web. The Summaries may not be the most challenging or complex data set to be published in this manner, but more is to come in terms of languages, deeper data, and links to other datasets. If you think this data is too insular (which it now is), why don't start adding some links of your own, which, in the manner of Linked Data, could be accomplished by just using Dewey URIs in your resource data. Remember, links go both ways!