Server-Side Linked Data Consumption with Ruby

In our series, learning linked data, we've covered several topics related to querying linked data using several SPARQL features and techniques, producing linked data using RDFa and working with JSON-LD. This month, we will shift our focus to consuming linked data.

In this post, we'll discuss consuming linked data using the Ruby programming language. There are a couple of reasons why using server-side libraries to interact with linked data is useful. The primary reason is that server-side code is very versatile and can be used both for creating user interfaces as well as processing data "behind the scenes". Ruby is particularly good because it allows many different serializations to be consumed, supports SPARQL, and has connectors for interacting with triple stores. Typically ruby code can be used to interact with linked data in three primary ways:

  1. Reading RDF files on your local computer
  2. Retrieving RDF data from the web via graph URLs or SPARQL
  3. Creating RDF data from scratch to load and parse

In this post we're going to focus on the second example - retrieving RDF data from the web. We'll focus on interacting the graph data that lives a specific URL.

How to do it

So what is necessary to read RDF data from the web and consume it using Ruby?   

The most basic steps to consuming linked data are: 

  1. Load the data.
  2. Parse the data returned into a graph. 
  3. Traverse the graph to display the data.

If you’re interested in a greater level of detail beyond the snippets in this post, view the full code for the simple samples (https://github.com/librarywebchic/c4l16_ld_ruby) that allows data in a graph associated with a given bibliographic record to be displayed.   

Loading the Data 

The first step in the process of consuming linked data is to load the data. Loading the data requires getting the data in a format that is parse-able. Linked data comes in several different serializations, and not all libraries can parse the different serializations. Most linked data sets support several different serializations, but some do not. This can present an issue if the data returned isn't in a serialization that the parsing library understands. Often, to get a particular serialization, a client needs to perform "content-negotiation." This is done by sending an HTTP request with a specific Accept header. In the example below, I'm using the REST HTTP client within Ruby to make the request to WorldCat for a graph for a specific OCLC number in the RDFXML serialization. 

Example 

url = 'http://www.worldcat.org/oclc/82671871' 
 
# Make the HTTP request for the data 
resource = RestClient::Resource.new url 
response, result = resource.get(:user_agent => "Example Ruby Linked Data code", 
  :accept => 'application/rdf+xml') do |response, request, result| 
  [response, result] 
end

Parsing the Data 

Once the data is loaded, it then has to be parsed. To do this, a linked data library is required. In Ruby, I'm using the RDF gem to do this. This gem enables clients to load linked data serialized as n-triples. In order to parse RDF in other serializations—such as RDFXML, turtle, or RDFa—additional gems are required. So, I'm also using the RDFXML gem.

In the example below, I’m creating a graph object and then parsing data into it. Note that I have to tell the library the format of the data I've retrieved. In this case, "application/rdf+xml" 

Example 

graph = RDF::Repository.new.from_rdfxml(response) 

Traversing the Graph 

Once the data is parsed, then the Ruby RDF library can be used to access specific properties within the graph. The library uses the linked data concepts of subject, object and predicate to allow particular properties to be selected using the URI for the subject and the URI for the predicate.  

The example below starts with a graph and extracts several values from it, including 

  • name,
  • author, and
  • subjects.

Example 

graph.query([RDF::URI(url), RDF::URI("http://schema.org/name"), nil]) do |statement| 
  puts statement.object.to_s 
end 
 
graph.query([RDF::URI(url), RDF::URI("http://schema.org/creator")]) do |statement| 
  graph.query([statement.object, RDF::URI("http://schema.org/name"), nil] ) do |creator| 
    puts creator.object.to_s 
  end 
end 
 
graph.query([RDF::URI(url), RDF::URI("http://schema.org/about")]) do |statement| 
  about = graph.query([statement.object, RDF::URI("http://schema.org/name"), nil] ) do |about| 
    about 
  end 
   
  if about.count > 0 
    puts about.first.object 
  else 
    puts statement.object 
  end 
end

I'm using graph.query to get the value of each property I'm interested in. Graph.query is a simplified syntax that takes three possible arguments—subject, predicate or object. Each of my queries sends the URI of the bibliographic record as the subject. The specific property I want to select is the predicate. The object argument is blank because this is what I want to return. 

Note that with the section of code that returns the string value for the subject, I am doing a set of nested graph.query operations. The graph.query gets all the values of the schema:about property. The second graph.query uses the specific schema:about object URI to get the schema:name for that URI. For more details, see the full code sample.

Working with a Graph in an Object Oriented Fashion 

In the example above, I'm using the basic RDF gem to read the data in the graph. However, this isn't the only way to interact with graph-based data in Ruby. Another Ruby gem, Spira, allows graph data to be used as model objects. While this approach requires a little bit more setup, it gives you the ability to work in a resource-oriented way and can make your code cleaner. The gem can be used to define model classes for your RDF data. Here, I have defined a class called Bib that extends the Spira::Base class. The Bib class has a set of properties which I have defined.

Example 

class Bib < Spira::Base 
   
  property :name, :predicate => RDF::URI.new('http://schema.org/name'), :type => XSD.string 
  property :author, :predicate 'RDF::URI.new(http://schema.org/creator'), :type => 'Author' 
  has_many :subjects, :predicate => RDF::URI.new('http://schema.org/about'), :type => 'Subject'  

For each property, I define the name, the predicate used for it and the type of data within that property. Note that now the author and subjects properties have types of 'Author’ and 'Subject.' These are references to additional classes within my code that also need to be defined.

Example

class Subject < Spira::Base 
  
 property :name, :predicate => RDF::URI.new('http://schema.org/name'), :type => XSD.string 
 property :type, :predicate => RDF.type, :type => RDF::URI

end
class Author < Spira::Base 
  
 property :name, :predicate => RDF::URI.new('http://schema.org/name'), :type => XSD.string 
 property :type, :predicate => RDF.type, :type => RDF::URI

end

Once the model classes are defined, you can easily access different properties of the class. I'm also going to define a method to retrieve my Bib graph within my Bib class. 

Example

def self.find(bib_uri)  
  url = bib_uri  
  resource = RestClient::Resource.new url  
  response, result = resource.get(:user_agent => "Example Ruby Linked Data code",  
      :accept => 'application/rdf+xml') do |response, request, result|  
      [response, result]  
  end 
   
  if result.kind_of? Net::HTTPRedirection 
    resource = RestClient::Resource.new response.headers[:location] 
    response, result = resource.get(:user_agent => "Example Ruby Linked Data code", 
      :accept => 'application/rdf+xml') do |response, request, result| 
      [response, result] 
    end 
  end  
    
  if result.class == Net::HTTPOK  
    # Load the data into an in-memory RDF repository, get the Bib  
    Spira.repository = RDF::Repository.new.from_rdfxml(response)  
    bib = Spira.repository.query(:predicate => RDF.type, :object => RDF::URI.new('http://schema.org/CreativeWork')).first 
    bib = bib.subject.as(Bib)  
    bib.response_body = response  
    bib.response_code = response.code  
    bib.result = result  
    bib  
      
  else  
    client_request_error = ClientRequestError.new  
    client_request_error.response_body = response  
    client_request_error.response_code = response.code  
    client_request_error.result = result  
    client_request_error  
  end  
end  

Now I have all the code needed to allow me to write simple code to retrieve any Bib and access its properties.

Example 

bib = Bib::find('http://worldcat.org/oclc/7977212') 
puts bib.name 
puts bib.author.name 
 
bib.subjects.each { |subject| 
   if subject.name 
      puts subject.name 
    else 
      puts subject.id 
    end 
}                    

For more detailed information you can see the complete code example. You'll want to look at both:

  • print_values_oo_spira.rb
  • And the files within the model directory

Both the samples in this post are overly simplified. If you want to see a more complete implementation of this technique, take a look at the Ruby gem for OCLC's WorldCat Discovery API.

General Observations 

Ruby has some really nice libraries for working with linked data. These libraries allow you to work with the data in both a graph and resource-oriented fashion, allowing a developer to use the techniques that best suit his or her use cases and skills. One example of this is retrieving ISBNs from the graph. This is most easily done via SPARQL.

Example

def isbns 
 graph_store = RDF::Repository.new.from_rdfxml(self.response_body) 
 # run the SPARQL here to get what we want 
 results = SPARQL.execute("SELECT ?isbn WHERE {?s  ?isbn.}", graph_store) 
 isbns = results.map{|result| result.isbn.value} 
end

Additionally, Spira allows you to choose which class you want to use for particular objects. This can be particularly useful for properties such as schema:creator, which can be different types: person or organization.

Example

def author 
  author_stmt = Spira.repository.query(:subject => self.id, :predicate => 'http://schema.org/creator').first 
 
  if author_stmt 
    author_type = Spira.repository.query(:subject => author_stmt.object, :predicate => RDF.type).first 
    case author_type.object 
    when 'http://schema.org/Person' then author_stmt.object.as(Person) 
    when 'http://schema.org/Organization' then author_stmt.object.as(Organization) 
    else nil 
    end 
  else 
    nil 
  end 
end

Final Thoughts 

There are significant advantages to working with linked data using Ruby. The community developing the gems is extremely active and constantly making improvements. The various gems allow developers to use different serialization and to employ different programming approaches to interact with graphs. There is also a nice playground (http://rdf.greggkellogg.net/) for developers looking to try out Ruby linked data tools. For these reasons, Ruby has become my language of choice when working with linked data.   

Register for our upcoming Linked Data webinar!

  • Karen Coombs

    Karen Coombs

    Senior Product Analyst