Learning Linked Data: SPARQL

One thing you realize pretty quickly is that it is very hard to work with Linked Data and just confine one’s explorations to a single site or data set. The links inevitably lead you on a pilgrimage from one data set to another and another. In the case of the WorldCat Discovery API, my pilgrimage led me from WorldCat to id.loc.gov, FAST and VIAF and from VIAF on to dbpedia. Dbpedia is an amazingly fun data set to play with. Using it to provide additional richness and context to the discovery experience has been enlightening.

One of the best things about dbpedia is that it has a SPARQL endpoint so you can easily query it. Knowing this, I decided to use SPARQL to find additional connections to an author. I knew that dbpedia had a few properties I wanted to explore:

  • influencedBy
  • author
  • starring

Once I knew the properties I was interested in, I had two options to access this data:

  1. Load the graph for these subjects and then pull the data out that way.
    I decided I didn’t want to do this because the properties I wanted to display are actually not in the initial graph. So I’d have to make several HTTP requests and merge the graphs to get the data I wanted. This would mean complex code and potentially slow performance.
  2. SPARQL dbpedia for the data I wanted.
    I decided to use this option because it only required a single HTTP call and, because dbpedia’s SPARQL endpoint can return JSON, it could be done dynamically via AJAX.

So what does the SPARQL look like to get this information? Let’s try look at two SPARQL queries which are intended to find information related to Jane Austen.

First, I want to get all of the people that dbpedia knows influenced Jane Austen and display their names. That query looks like this

 

SELECT ?givenName, ?surname

WHERE {

?s dbpprop:influences <http://dbpedia.org/resource/Jane_Austen> .

?s foaf:givenName ?givenName .

?s foaf:surname ?surname .

}

What’s going on here?

  • SELECT the givenName and surname from the triples
  • WHERE the triples have a predicate influences is the URI http://dbpedia.org/resource/Jane_Austen
  • And subject’s foaf:givenName predicate is the givenName
  • And subject’s foaf:surname predicate is the surname

I send this query off to dbpedia via the SPARQL endpoint so that I can get JSON back.

The JSON contains SPARQL Query Results that look like

{

    "head": {

        "link": [],

        "vars": [

            "givenName",

            "surname"

        ]

    },

    "results": {

        "distinct": false,

        "ordered": true,

        "bindings": [

            {

                "givenName": {

                    "type": "literal",

                    "xml:lang": "en",

                    "value": "Shane"

                },

                "surname": {

                    "type": "literal",

                    "xml:lang": "en",

                    "value": "Bolks"

                }

            },

            {

                "givenName": {

                    "type": "literal",

                    "xml:lang": "en",

                    "value": "Mary Anne"

                },

                "surname": {

                    "type": "literal",

                    "xml:lang": "en",

                    "value": "Evans"

                }

            }

        ]

    }

}

Second, I want to get all of the books that dbpedia knows Jane Austen is the author of and display their titles linked to their Wikipedia pages. That query is fairly similar to our first query.

SELECT ?name, ?url

WHERE {

?s dbpprop:author <http://dbpedia.org/resource/Jane_Austen> .

?s dbpprop:name ?name .

?s foaf:primaryTopic ?url .

}

I came across two more useful resources while working on this project:  

Once I had these SPARQL queries mastered, I was able to write a JQuery script that performed the SPARQL and parsed the results. The final product enhances a discovery UI with data from dbpedia about an author’s works and whom they influenced. You can check out the Javascript code in GitHub. There are other properties that might also be of interest in dbpedia and one could use this code as a model for how to incorporate those.

Our first post in this new series on Learning Linked Data covered some handy tools, and our third post will follow in a couple of weeks. Do you have tips or lessons learned from working with Linked Data? Questions as you are just getting started on your own Linked Data journey? We'd love to hear from you.

 

  • Karen Coombs

    Karen Coombs

    Senior Product Analyst