default

About Ross Singer's Umlaut

16 August 2007

This page provided information about the Umlaut at the time it won the Second OCLC Research Software Contest. Parts of it may now be outdated. Readers are encouraged to consult http://umlaut.library.gatech.edu/umlaut/wiki for more recent information about the Umlaut.

27 September 2006

Ross Singer's Umlaut won the Second OCLC Research Software Contest, held 1 July - 15 September 2006. This page is provides more information about the service.

Overview

The Umlaut is an OpenURL Link Resolver intended to improve access to library collections by contextualizing citations and available holdings more accurately for a given user. It utilizes a host of web services, including many from OCLC, and will take several paths depending on what it finds at various stages.

How it works:

When a user first begins a session (in the traditional way one would use a link resolver, from an abstract database, e-journal list, opac, etc.), the Umlaut takes the user's IP address and asks the OCLC Resolver Registry if there are other link resolver services associated with this IP. If the response is yes, the Umlaut adds those services to the user's collection.

It then puts together the user's 'collection' based on their profile (although only a default profile is currently enabled) which adds catalogs and link resolvers associated with a particular institution. The goal is to have all of the available information available to that user at hand to draw from, including public libraries or other schools or libraries they have access to.

Analyzing a citation

If there's a standard identifier (currently just dois and pmids, based on the needs of our population, but there are stubs for oclc numbers, bibcodes and handles), it looks it up against the id authority (crossref, pubmed, etc.) and grabs all of the metadata about the citation from those sources.

It then takes the metadata, and submit requests to SFX (other link resolvers could be supported, but that's all I have access to), our catalog (which is exported to a Zebra database) and our state union catalog (both searched via SRU).

If the item has an ISBN, the Umlaut uses xISBN to get all editions and searches on those.

In the case of conference proceedings, the Umlaut will do a series of searches (since incoming citation metadata is usually fairly spotty, it's not always obvious that it's referring to a conference, so there's a whitelist of keywords ['papers', 'transactions', 'spie', 'ieee', etc.] to determine if it's a conference) and it will then use the draft 'bib' openurl context set to find the conference and volume in our opac.

Searching relevant databases

At this point, if the item has an ISSN or ISBN, it will search Worldcat.org and, if there's a record, will present a link to view the item.

Next, it searches Amazon (if there's an ISBN), Google and Yahoo (using their APIs).

It takes all the metadata from Amazon (description, similar items, etc.).

For Google and Yahoo, it loops through the results and checks to see if any appear in a 'relevant sites' whitelist (items in ROAR, arXiv, Citeseer, etc.) and, alternately, a blacklist (Amazon.*, other online booksellers), and then checks every link to see if it appears in our proxy server (EZProxy).

Some URLs have special handlers (arXiv, Citeseer, Citebase, CiteUlike, etc.), that the Umlaut processes differently (arXiv, Citeseer and Citebase go into the Fulltext holdings bin, CiteUlike is mined for descriptions, tags and tables of contents). The rest become "Closest Web Results".

Display

At this point, we display the results to the user. When the page renders, an AJAX call is made back to the server and the server handles a series of background requests.

If any OAI providers were identified in the Google/Yahoo results, the Umlaut will make OAI requests for the record being viewed (currently only really works for Citebase and Citeseer, with mixed results due to the quality of the metadata).

It also takes the FullText links (if there were any) and queries Connotea, Yahoo's MyWeb and Unalog to see if anybody bookmarked these links and, if so, gets the tags and any records that share those tags. It does the same thing for CiteULike, although it treats it like an OAI provider.

It stores all of the subjects that it has collected from all over (MeSH from Pubmed, SFX Subjects, LCSH from the OPACs, tags, Amazon) with the referent to help inform the upcoming recommender service (which should be available sometime this fall).

The Umlaut is also COinS enabled and has an unAPI interface (which gives you the ContextObject and whatever data it found in either JSON or XML).

—Ross Singer, 15 September 2006 [/rcb 27 September 2006; /rcb 16 August 2007]


Return to OCLC Research Announcement:
Ross Singer wins Second OCLC Research Software Contest.

We are a worldwide library cooperative, owned, governed and sustained by members since 1967. Our public purpose is a statement of commitment to each other—that we will work together to improve access to the information held in libraries around the globe, and find ways to reduce costs for libraries through collaboration.