Developer House Project: Advanced Typeahead

We are Jason Thomale from the University of North Texas and George Campbell from OCLC, and we created an advanced “Advanced Typeahead” application during the December 1-5, 2014 Developer House event at OCLC headquarters in Columbus, Ohio. The Developer House events provide OCLC Platform Engineers and library developers an opportunity to brainstorm and develop applications against OCLC Web Services. We would like to share our development experience and the application we designed in this blog post.

Our Motivation

First of all, what do we mean by "advanced typeahead?"

Typeahead (or autocomplete) widgets have been commonplace now for years: in a search system, when you begin typing a search string, the system attempts to complete your string for you as you type and often shows you a dropdown containing the best matches for the string you've typed so far. Many library ILSes and discovery systems have implemented such a feature. During the last Discovery House event, a team created one that uses the Virtual Internet Authority File (VIAF) as its data source.

But typeahead widgets have recently begun to include more information than just query string matches. Jason first noticed an example of this about a year ago while using the Internet Movie Database (IMDB). Their typeahead shows you movie titles, actor names, director names, television show titles, and more, and it shows you information allowing you to identify exactly the title or name in question. In this case, you're clearly matching things, not strings. In a similar vein, North Carolina State University (NCSU) within the last 6 months has implemented what they call a "stratified typeahead," which shows you categorized suggestions as you type, such as FAQs and best bets along with search suggestions.

The predominant theme at Developer House was facilitating discovery with linked data. And attempting to create a typeahead widget that was some sort of amalgamation of IMDB's and NCSU's seemed perfect as a use case for library linked data—what could be better than leveraging library authorities in a way that helps users immediately disambiguate even before they've finished typing their query?

What We Built

What we ended up with I think fulfilled our goal given the one week timeframe we had to work on it, but it could still use development to completely fulfill the potential of library linked data. The thing about typeahead widgets is that they have to be responsive. More than a split second delay and they feel sluggish—obnoxious, even. And the thing about linked data is that parsing it is notoriously slow. Because we only had a few days to work on our project, in order to build what we wanted to build, we needed access to data that was already tailor-made for powering a typeahead.

Fortunately, as a result of the previous Virtual International Authority File Autocomplete project, the VIAF API now has an endpoint designed specifically for doing typeahead queries. Even more fortunately, calls to the VIAF Auto Suggest endpoint return an entity type along with each matching authority heading. This meant we had easy access to exactly the data necessary for the sort of categorized display that we were after.

We built our widget as a jQuery plugin that uses the typeahead.js library to implement the typeahead functionality. Using our jQuery plugin allows you to create a VIAF-powered advanced typeahead widget on any HTML text input box. In addition to the plugin, we also built a demo front end app using JavaScript and HTML that demonstrates a sample implementation of the widget.

Using typeahead.js and the VIAF Auto Suggest API made fulfilling the original goal of the project—insomuch as we could in a few days—very easy. So easy, in fact, that we had that part done by the second day. So we added an extra component to the project, which was to try sending queries from our VIAF-advanced-typeahead-enabled search box to OCLC's Discovery API to see if we could bring back and display results in some way that might complement the display of the advanced typeahead. This second part of the project was more ad-hoc; the goal was more to experiment with OCLC's Discovery API than to build anything particularly useful. Even so, we were pleased with the results.

In order to build the search component on the front end, we needed a server-side back end to handle querying and authenticating against the Discovery API. We wanted the JavaScript front end app to continue handling the UI display and interaction, so our back end would only need to accept a search query, authenticate, get and parse the data from the Discovery API, and return a JSON object containing the data needed for our display. To do this, we created a Python Django app and used the Django REST Framework; we used the Python rdflib for parsing RDF, and we used the OCLC Python Authentication Library for authentication.

Once we'd built the back end, we simply added the necessary code to our JavaScript front end to handle the UI interaction—a couple of views, a few Handelbars templates, and some controller code to glue it all together.

The final result is a demo application that (in addition to using our advanced typeahead widget) uses OCLC's Discovery API to display a word cloud showing topics related to your query. Topics are linked to the appropriate resource (from VIAF, id.loc.gov, etc.) and, where possible, are color-coded by entity.

Working with the APIs

OCLC's APIs of course drive our application, and next we want to discuss how those were implemented to accomplish our project's goals—we'll talk about three components in particular: VIAF's API, the Python Authentication Library, and the Worldcat Discovery API.

VIAF API

The VIAF API, of course, provides access to authority data. What we needed to be able to do with that API was to submit a search query and get a lightning-quick response containing data we could use to power the typeahead widget. At a minimum we wanted to get authorized terms and entity types (e.g. personal names, corporate names, work titles, etc.), but we considered displaying additional information if available. As we mentioned, the VIAF Auto Suggest endpoint fit the bill. Hitting the following URL:

http://viaf.org/viaf/AutoSuggest?query=cooking

Returns JSON data that looks like this:

{
   "query":"cooking",
   "result":[
      {
         "term":"Cooking vinyl",
         "nametype":"corporate",
         "bnf":"14000565",
         "viafid":"136240816"
      },
      {
         "term":"Cooking Club of America",
         "lc":"n00006564",
         "nametype":"corporate",
         "viafid":"140682941"
      },
      {
         "term":"Cookingham, George E. , 1912-",
         "lc":"n84123642",
         "nametype":"personal",
         "viafid":"28475520"
      },
      ...
   ]
}

As you can see, the terms we want are included in the "term" elements, and the entity types are included in "nametype." The response also includes identifiers, which we could theoretically use to get more data about each entity—but of course making additional API calls would be untenable due to the speed needed for our typeahead. (As an aside: if it would be feasible to start including a small amount of additional data about each term—for instance, information to help with disambiguation—in the VIAF Auto Suggest API, that would help create an even more robust advanced typeahead.)

In our JavaScript client, our typeahead plugin makes a JSONP call to the VIAF Auto Suggest endpoint each time the user's query in the search box changes. We've created a filter function that groups terms by nametype and then sends the resulting data to the typeahead.js suggestion engine (a "Bloodhound" object). A simple Handlebars template is used to render the final result.

Python Authentication Library

The VIAF API is open to the world, so authentication in that case is not an issue. However, the OCLC Worldcat Discovery API does require authentication. Since we are using Python to access that API, we can use OCLC's Python Authentication Library to handle the nitty-gritty details for us. In our case we wanted to use the Client Credentials Grant method.

The first step, of course, was to acquire a WSKey and appropriate permissions from OCLC to access the API. First, we obtained authentication credentials from OCLC:

  • A Web Services Key (WSKey)
    • clientID (public key)
    • secret (private key)
  • The institutionID we are authorized for.

We plugged these parameters into an authentication settings file in our project and made use of the OCLC Python Authentication Library to submit our credentials and obtain an access token (via methods on a Wskey class). Our Python application could then pass the access token string in the HTTP Authorization header to the Discovery API when making a request.

Because our implementation was quick and dirty, we did not bother trying to persist an authentication token across multiple requests (e.g., via setting a cookie). Each time our front end app makes a call to the back end, a new access token is requested before the Discovery API call is made. In practice, the token’s are generally valid for 20 minutes, and a requesting a new one while the old one is still valid will cause the service to return the same token back.

OCLC Worldcat Discovery API

In a nutshell, according to the page about the service on OCLC's website, the Discovery API "exposes your collection data for items in Worldcat" (for customers of Worldcat Discovery Services) using "Linked Data response formats." The API includes a number of queryable resources such as bibliographic resources, database resources, and offer resources; you can query several different fields using a variety of parameters.

Our use of the Discovery API is pretty straightforward. For our application, we only wanted some approximation of what topics were associated with a search query, so we chose to focus on querying bibliographic resources and parsing out all of the subjects attached to those resources. So, our Python Django application takes a query parameter (q) and passes that through to the https://beta.worldcat.org/discovery/bib/search endpoint. It requests data as RDF Turtle. When data is returned, it generates an rdflib Graph from the returned data. A SubjectGetter class navigates the RDF graph to pull triples with bibliographic resource subjects and schema:about predicates. For each of these, the schema:name (subject term) and identifier (subject URI) are put into a data structure that's eventually returned to the client as JSON.

For example, a query on "cooking" returns a JSON response that looks like the following:

{
   "subjects":[
      {
         "id":"http://experiment.worldcat.org/entity/work/data/613616#Topic/african_american_cooking_history",
         "label":"African American cooking—History"
      },
      {
         "id":"http://experiment.worldcat.org/entity/work/data/613616#Topic/african_american_cooking",
         "label":"African American cooking."
      },
      {
         "id":"http://experiment.worldcat.org/entity/work/data/613616#Topic/african_american_cookery_history",
         "label":"African American cookery—History."
      },
      ...
   ]
}

The JavaScript client application then uses this data to generate a word cloud. Terms are linked using their respective URIs, and they are color-coded based on their source (id.loc.gov, VIAF, or experiment.worldcat.org) and, if present in the URI, entity type.

Usage

You can clone the repository from Github, e.g.:

git clone https://github.com/oclc-developer-house/advanced-typeahead

If you want to run the demo application, follow the instructions in the README to deploy and launch. As stated there, this is by no means intended to run in any sort of production environment—it uses the Django development server and development settings. Note also that you’ll need to get your own credentials from OCLC for accessing the Discovery API and include those as instructed in the README.

If you want to use the Advanced Typeahead jQuery plugin in your own project, then just grab the advanced-typeahead/html/scripts/advanced-viaf-typeahead.js file and include it in your project. (Make sure you also have jQuery, typeahead.js, and handlebars.js included.) Initialize the typeahead on a searchbox using:

$("#searchbox").advancedViafTypeahead(options);

In the above, be sure to replace #searchbox with whatever CSS selector will actually select your search box. Sending options is optional; check out the code to see what options you might want to send. In most cases, the defaults should be reasonable.

Summary

The Developer House was designed to give library coders hands on experience tackling real-world problems using OCLC's APIs. The advanced typeahead concept was something that Jason had been interested in trying out with actual library data, and this gave him the opportunity to do so. We think the basic concept worked well, and there is still plenty more that could be done in the future! A huge thanks to OCLC for sponsoring the event.

Useful links

We've also added this application to our Gallery. Stay tuned for more projects from Developer House.

The WorldCat Discovery API is currently available as a beta for a select number of libraries using WorldCat Discovery Services. Interested in participating in the beta? Contact us today.

 

 

  • George Campbell

    George Campbell

    Senior Software Engineer

    O: 614-764-6227

Jason Thomale

Resource Discovery Systems Librarian
University of North Texas
jason.thomale@unt.edu