Learning Linked Data: Some Handy Tools
I’ve been working with Linked Data off and on for a while now but really the last year has been my deepest dive into it. Much of that dive involved writing a PHP library to interact with the WorldCat Discovery API. Since I started seeing how much could be done with Linked Data in discovery, I’ve been re-adjusting my worldview and acquiring a new skills set to work with Linked Data. This meant understanding the whole concept of triples and the subject, predicate, object nomenclature. In our recent blog posts on the WorldCat Discovery API, we touched on some of the basics of Linked Data. We also mentioned some tools for working with Linked Data in Ruby.
However, I’m a PHP programmer so what sort of tools have I been using? The biggest thing I’ve added to my toolbox is a robust RDF library. For me, this is EasyRDF. Thanks to this library I can leverage the richness of the data and entities in the WorldCat Discovery API output easily. EasyRDF makes retrieving graphs and accessing the data and entities within those graphs fairly easy once you understand the basic objects within the library. One incredibly useful piece of EasyRDF is the Typemapping functionality. This functionality allows an rdf:type to be mapped to a specific PHP class with particular methods. This is useful because different types use different properties. The Schema:article type has an issue and start page properties. These properties are not present in Schema:book. The fact that I can use different classes for these simplifies the code I need to write when I want to create displays. I can even create a specific display for articles that shows their pertinent properties – journal name, volume, issue, page. The WorldCat Discovery API PHP library uses typemapping extensively and, while I haven’t mapped all the relevant types, I’ve covered some major ones and will be looking to expand this in the future.
Another tool I’ve recently added is a triplestore. Because EasyRDF has classes that allow you to interact with a triplestore, I became interested in running a triplestore. At first I was concerned that running one v would be a lot of work. However, bringing up a triplestore proved to be fairly trivial for me. I ended up using RedStore which has a nice simple installer for Mac. You may be asking yourself why I wanted a triplestore if the data I’m using is already on the web. I had three reasons for exploring this.
- Caching data. Being able to cache data offline so that I don’t have to make an http request to the wider Web is a big reason why I wanted to play with a triplestore. This is especially useful when a data set is fairly static and you have the right to cache it under its license.
- Combining data from different data sets. One thing I think it would be interesting to explore is taking an existing data set like the FAST geographic headings and adding triples to it. Specifically, I want to see if I can enhance FAST with owl:sameAs properties that linked to dbpedia. To do this I would need my own copy of those headings in form that was editable. A triplestore seemed like the best solution for this
- SPARQLing data which doesn’t have a SPARQL endpoint or is across data sets. Not all Linked Data on the web have SPARQL endpoints. Also an endpoint if usually for a particular dataset so it makes it difficult to obtain information across datasets.
Not having a SPARQL endpoint can be a real pain if the question you need to ask spans multiple graphs. One way to get around this is to retrieve the necessary graphs and dump them into a triplestore then SPARQL that triplestore. I’ll come right out and say this is far from ideal and really only works well if you can cache the data ahead of time. It also can send a TON of traffic to the servers of the data set owners and may not be allowed under the data license. So think carefully about if you want to do this. Pulling data locally into a triplestore may also be the only option for SPARQLing across data sets.
Having mentioned SPARQL, it is another thing I’ve added to the toolbox. It also was probably the most difficult thing for me to learn. Having finally had some success, it has been super worthwhile. If you’re interested in knowing more about my SPARQLing adventures, check out next week’s “Learning Linked Data” post.
Senior Product Analyst