SPARQL Tips Tricks and Tools

Getting the Shape of a Dataset or Graph

One of the things I've found most intimidating about working with SPARQL is using datasets that I'm unfamiliar with. A couple colleagues tried to explain to me how I could use SPARQL to understand the dataset by querying generically, but the concept was lost on me until recently. So I thought it was worth going over for others who might be confused about the whole process. One of the most common things I want to know as a developer is what "properties" or predicates are present in the graph. The easiest way to do this is to use the following syntax. I'm using the FROM statement here to specify the URI of the graph I'm interested in.

SELECT DISTINCT ?p
FROM <http://worldcat.org/entity/work/id/67201841>
WHERE
  { ?s ?p ?o.
  }

Another common task is to find all the subjects in the graph. By doing this, I get a better sense of whether or not the graph is a Concisely Bounded Description.

SELECT DISTINCT ?s
FROM <http://worldcat.org/entity/work/id/67201841>
WHERE
  { ?s ?p ?o.
  }

You'll notice in each of these SPARQL statements that I have the variables: ?s, ?p and ?o. These stands for subject, predicate and object. In the query above, what I'm really saying is, “list all the distinct subjects for all the triples statements in this graph.”

The last task that is nice to do via SPARQL, if you're trying to get a sense of the graph, is to see all the rdf:types within the graph.

SELECT DISTINCT ?o
FROM <http://worldcat.org/entity/work/id/67201841>
WHERE
  { ?s a ?o.
  }

SPARQLing a specific graph using FROM

Another issue I had with SPARQL was finding an endpoint that I could use to practice. Most of the bibliographic datasets I was interested in didn't have SPARQL endpoints. I thought I had to have a SPARQL endpoint to the dataset that I wanted to interact with in order to practice writing SPARQL. This isn't the case at all. If you know the URI for something, you can write a SPARQL query against it. How does that work?

SPARQL has a FROM clause that can be used to specify the URI of the graph you want to query. You can then query the graph and get back results. This is super useful if you just need very small pieces of a graph or parts of a graph that match a particular pattern. I also have found it super useful when I want to explore the data at a particular URI without having to look at the whole graph. This can be done via a code library or a general purpose SPARQL processor, like the one at http://sparql.org/sparql.html.

In the example below, I'm gathering up all the URIs for the bibliographic records associated with a particular WorldCat work entity: .

Example:

PREFIX schema: <http://schema.org/>

SELECT ?bib
FROM <http://worldcat.org/entity/work/id/67201841>
WHERE
  { 
<http://worldcat.org/entity/work/id/67201841> schema:workExample  ?bib .
  }

Depending on the library being used, I can get the results back in a variety of formats. For my purposes, I want a JSON object that I can easily parse and use to write additional code.

Simplifying code by using SPARQL

What purpose does SPARQLing a particular URI serve? Sometimes using a SPARQL is just a faster way of interacting with the graph. One good example of this is getting the ISBNs for a particular bibliographic graph. Each graph can have more than one schema:workExample property that links to a schema:ProductModel object. Each schema:Product Model can have multiple ISBNs. The result is that if you try to access this by just navigating the graph, you will end up with a nested structure like this:

[
"ProductModel_1":
{
isbn:123654789,
isbn:98765432
},
"ProductModel_2":
{
isbn:123654789321,
isbn:987654321471
}
]

To create a list of ISBNs, you have to flatten this data structure.

A simpler way to get this data is to use SPARQL, which will return the desired flattened data structure. You can even use SPARQL to de-duplicate the ISBNs between the schema:ProductModel entities.

PREFIX schema: <http://schema.org/>

SELECT ?isbn
FROM <http://www.worldcat.org/oclc/154684429>
WHERE { 
<http://www.worldcat.org/oclc/154684429> schema:workExample ?workExample .
?workExample schema:isbn ?isbn. }

This will return data in a structure that is much simpler to deal with.

From these examples, you can see that SPARQL can be very useful when working with linked data, even if you aren't able to SPARQL a whole dataset. In our next post, I will talk about querying data from a data set without a SPARQL endpoint.

Register for our upcoming Linked Data webinars!

  • Karen Coombs

    Karen Coombs

    Senior Product Analyst