What’s in a name: URL patterns on the WorldShare Platform
This is the second post in our series on the design of web services at OCLC. In our first post on the series, we discussed the overall strategy at a very broad level. This time we will move way over to the other end of the spectrum of granularity and talk about how we mint our HTTP URLs. Again, our thinking has evolved over the past few years, but is coming to a stable place. As we craft new APIs and refactor existing ones on the WorldShare Platform, we will use the patterns below. We hope that by sharing this information, developers will see patterns emerge that can contribute to a deeper understanding of how to integrate with the WorldShare Platform.
What is the purpose of a URL pattern?
The fundamental question we must first answer with our URL pattern is the purpose it serves. A URL, or uniform resource locator, is a reference to a specific resource on the World Wide Web. For our suite of web services, those resources define endpoints at which data and/or functionality is available on OCLC servers. Insofar as it is a reference, we are providing an addressable name for things in our suite of services. Understanding this use of URLs as names is critical to understanding the logic behind the pattern.
Domains: routing across the globe
OCLC is a global company serving libraries all over the world. To provide the best possible service, we have server environments (“data centers”) on 3 continents and in 4 countries:
- North America: U.S. and Canada
- Europe: England
- Australia
We deploy our software in multiple geographic regions primarily for two reasons:
- Data Security Compliance: many countries have laws that govern the specification of where data must reside, especially for sensitive data such as personally identifiable information
- Performance: clients making requests to OCLC applications should not have to literally cross oceans to retrieve data if it is not necessary
The first thing a WorldShare Platform URL needs to do is just get a web service client request close to a server that can fulfill its request. In other words, it needs to send the client’s request to the correct continent or regional data center. The domain names we use in our URL patterns route web service client requests to the appropriate regional data center. Not all of our applications and data are deployed globally, so we have two primary patterns for our domains.
Routing to the Mothership
When routing a request to a service that only exists in a single location, we use the following pattern for the domain name portion of a URL:
{<library>.}worldcat.org
Today, URLs with domain names following this pattern typically route to the original OCLC headquarters data center in Dublin, OH. Additionally, we may also optionally identify a particular library to localize the web resource that is being accessed.
Routing globally
When clients make a request using a URL containing a domain name like the following:
<library>.share.worldcat.org
it is routed to the regional data center nearest to the library whose data is being accessed. The “library” token in the domain name will usually be a string chosen by the library to distinguish its resources from others’.
Walking the paths within our servers
Once the global Domain Name System has routed your requests to the right regional data center, the next piece of routing happens through the path portion of the URL. The design of the URL path is where application and domain-specific semantics come into play. Each portion of the URL, the individual strings of characters separated by the forward slash, becomes a token with a particular meaning. Our pattern divides the URL path into the following tokens:
- Context: the context establishes a high level domain; for example, “ILL” or “acquisitions”
- Class: for our RESTful, resource oriented services, the class identifies the type of entity being accessed; payment token or purchase order
- Controller: the controller is either a “non-CRUD” action being taken for a particular class (for example, “search”) or in the special cases of controllers named “id”, “resource” or “data” the “CRUD” (Create, Read, Update or Delete) action is determined by the specified HTTP method (for example an HTTP POST corresponds to a Create action, GET to Read, PUT to Update and DELETE to Delete)
- UID: the UID is unique identifier for a particular instance of the class in question; CRUD actions typically require a UID to identify the resource that is the target of the action
- Model: the optional model token denotes the desired semantic model for the resource being accessed
And finally, after the path, we use an extension for URL-based content negotiation. Where possible we base the content types on IANA defined mime types, such as application/atom+xml.
Example: translating a URL into human-speak
Therefore, if we see the following URL:
https://ocpsb.share.worldcat.org/sample-service/work/data/1234.xml
it can be read as the following:
The XML representation of the work identified by #1234 from our sample service localized for the OCLC Sandbox Institution, the institution whose OCLC symbol is “OCPSB”.