OCLC Web Services Strategy: Existing Standards and REST
OCLC has, for over forty years, been in the business of acting as a trusted steward of high quality cataloging and other data assets that support the mission of libraries. Our mission is to connect the world’s libraries to each other, to their partners and to the broader world through the web. In a 21st century cloud computing environment, we are opening access to data entities in WorldCat and other OCLC systems that have historically been mediated through a limited number of OCLC and partner interfaces.
OCLC has been releasing web services on our WorldShare Platform for the better part of 2 years now. We have learned a lot along the way and wanted to share with you the background for how we design the specifications for new web services. At the core, when we release a web service, we are providing clients with access to resources in the form of data or business functionality available on OCLC’s servers. In general, the design and implementation of a service falls into one of two strategies.
Our first strategy is clear cut: identify existing standards that match the desired exposure of data entities or business functionality. However, in many cases, no such standard exists and OCLC relies on our second strategy: implement according to an internal data protocol. The protocol is intended to act as an internal standard for OCLC web services to provide a cohesion and coherence among the diverse range of web services that we offer. The protocol defines standard REST interactions, data formats, and API components.
Strategy 1: Implementing Services Using Existing Standards
When evaluating the options for how to implement a new service, our first pass tries to identify any standards that already exist. For example, the NCIP standard defines interoperability for ILS functions, which makes it a perfect fit for web services that provide access to the WMS Circulation functionality, such as checking out library items or placing patron requests for items.
This is an example of the first strategy. A standard is defined and our goal is to implement a service according to its specification so that our integration partners and library membership can take advantage of any existing knowledge of the standard as they work with our APIs. Our goal is to reinforce the hard work that has been put into the standards bodies and communities both inside and outside of the library sphere.
Strategy 2: Implementing Services Using a Pragmatic REST & AtomPub Model
As mentioned above, in many cases, OCLC is implementing a new web service for a data collection or set of business functionality for which there is no existing standard. In this second scenario, we have goals that are similar to the case in which a standard like NCIP exists. Namely, we want to provide a level of consistency across our web services that enables developers to become familiar with the style of OCLC web services. Certain tasks and elements are common across our web services, such as:
- general request/response mechanics
- searching and querying
- parameters that represent certain identifiers
- error messaging
A standard enables developers to bring their existing knowledge working with a previous system to a new programming task. Similarly, by coordinating consistently implemented features across the diverse range of web services at OCLC, we hope to enable developers implementing clients of our services to gain momentum with their projects by seeing patterns as they move from one API to the next.
Our strategy has changed over time, but in paying attention to the dominant trends on the web, it has evolved so that our current locally designed services are based on the REST model and geared towards the implementation guidelines defined by AtomPub. Our motivations for choosing this strategy are based on the following principles.
For starters, our service exposure is based on HTTP because it is the protocol of the web. HTTP defines the four “CRUD” operations common to most programming paradigms as the verbs associated with a given request:
- Create: POST
- Read: GET
- Update: PUT
- Delete: DELETE
By defining data assets in terms of HTTP resources, we can enable the greatest possible flexibility for how people use the cooperative’s data. By removing privileged mediation to OCLC data and business functionality through exposure of these basic operations, we can be agnostic with respect to the various creative ways a client can interact with our data.
4 Basic Operations, Plus One More
Using the HyperText Transfer Protocol as our foundation, our web services define resources on the web that can be accessed by HTTP clients. These resources come in the form of data entities or functional endpoints against which some action can be taken in an OCLC system. Our intent is to provide access to the core objects in OCLC systems throughout the objects’ life cycles.
Resources and entities can be created, read/accessed, updated and deleted/destroyed. In addition to the four life cycle operations, most web services need to provide a means for discovering the resources they define. Searching, therefore, becomes the honorary fifth member of the suite of basic life cycle operations that our web service model is based on.
Flexibility for Resource Representations (Formats)
Our REST services will typically define at least two serialization formats: XML and JSON. These are the basic structures for data that are commonly supported throughout the web. They have a high rate of adoption and are well understood by most developers. Furthermore, many library data formats and standards are based in these technologies or define an implementation in XML or JSON (such as MARCXML). Using XML and JSON as the foundation, OCLC web services support standard and custom media type formats.
Additionally, our REST-based architecture embraces HTTP-style content negotiation through file extensions and HTTP Accept headers. Given that data is one of our core businesses at OCLC, we know that there are many different formats that we will need to support. HTTP-based content negotiation provides a mechanism to embrace the diversity of data needed to conduct the business of libraries and embrace the power of the web to connect libraries and partners through the OCLC WorldShare Platform.
An AtomPub Implementation
As mentioned above, our default implementation of our REST model is based on the Atom Publishing Protocol. Our efforts in this area extend beyond merely using the Atom serializations for lists of resources and individual resource entries. AtomPub also specifies an appropriate use of HTTP verbs and response codes for different kinds of requests that can be made against our web services. Because there is no formal specification for REST, the AtomPub specification provides more concrete guidelines for how to define the service interaction and introduce consistency across our web services.
Additionally, Atom serializations have found common implementations in blogging software that produces search results or syndication feeds. For web services that implement search functionality, which includes many of our services, when we use the Atom serializations, we are using well defined element names to describe the common features of a search result set. This includes elements like:
- the total number of items retrieved,
- the number of items returned in the current response, and
- the offset of the first item in the current response from the first item retrieved.
Having these data elements defined across all of our services will reduce the amount of time a developer writing a client will need to spend as he or she moves from one OCLC web service to another.
In addition to specifying an appropriate use of HTTP verbs and common data elements in serializations, AtomPub also provides a standardized way to implement resource versioning through its specification of the use of ETags and HTTP If-Match headers. While the details are beyond the scope of this post, the AtomPub ETag mechanism allows us to implement rules that avoid concurrency issues for updates to data.
There are also benefits for using Atom serializations of individual data resources, such as a single bibliographic manifestation in the WorldCat Metadata API. We expect most of the clients working with this API will use MARCXML as the primary serialization flavor. However, in addition to the core data resource, additional information may need to be transmitted back to a client of the API. When interacting with a given bibliographic resource, it is helpful to pass information about the URI that identifies the resource after a new record has been created. In the event of a malformed request trying to update a resource, error messaging is important to communicate to a client.
By wrapping the MARCXML document in an Atom serialization, we can add this information to the responses that a web service sends back to clients. Furthermore, beyond the data elements required by Atom, the format can be extended with custom elements to pass error/exception messages about missing parameters or required permissions that a client lacks. Placing this information in an Atom wrapper means that we can leave the MARCXML data intact. We are therefore able to return back to the client:
- data about a particular resource, and
- extra information pertaining to the current web service request.
With the Atom wrapped representation of the resource, we can accomplish this without adding, for example, error messaging into the core MARCXML resource representation in a way that would make the MARC data invalid and break the parsing rules of a MARC code library. The Atom wrapped data representations add padding to a resource, but give us a nice separation of the data that pertains to a web service request and data that pertains to a particular data entity.
Finally, one of the last pieces of the Atom Publishing Protocol I will mention here is that it requires us to define unique IDs for every resource in our web services. Atom “entries” must have globally unique IDs defined and links that specify the location on the web of the current resource. This reinforces the resource orientation of our web services. It also aligns the web services with our efforts within OCLC to support the semantic web for bibliographic and related entities. We can satisfy the globally unique requirement by using our corner of the web of data: we simply have to define our identifiers in terms of our WorldCat.org HTTP domain.
Technical Product Manager