Using the OpenLibrary API
As one of the developers of the WorldCat Search API, I’ve been a fan of APIs and webservices for quite a while now. And of course I’m a big fan of WorldCat.org and its links, both inbound and outbound. So I was glad to be involved in a recent small project to supplement existing Open Library records with OCLC numbers, using their API. Some Open Library records already had OCLC numbers, but many did not. More OCLC numbers would make connections from Open Library records to WorldCat.org, to find physical copies, more reliable. And for those with an OCLC number in hand, it would be great to be able to find matches with it in Open Library. OCLC and Open Library agreed to test the addition of OCLC numbers for records with ISBNs using the Open Library API, and I learned quite a bit along the way.
As noted in George Oates’ recent post on the Open Library blog, the initial plan was to use the Open Library API directly, traversing a list of ISBNs and corresponding OCLC numbers to find ISBN matches and, where needed, update the Open Library record. We found that this process would work, but would take far too long to complete. I think my original estimate was something like 8 years. So we worked on a hybrid approach. I preprocessed the OCLC list and a download of Open Library records to identify just the records that would be updated. The Open Library API has a nifty batch facility that helped the oclcBot march through a list of close to 4 million attempted updates, a thousand records at a time. Start to finish, with pauses between updates to be nice to the API, the updating process took about a week.
Most of my experience with APIs has been in a read-only mode, so it was good to have a chance to see a real-life example of a writeable API for Open Library data. The API and oclcBot are written in Python … though sometimes it seems I’ve written at least a little code in just about every language, Python was new to me, and an adventure. A little thing like whitespace indentation was enough to cause a newbie like me to stumble. But I was greatly helped by access to other Open Library bot code examples along the way, and consultation help from the Open Library developers. We’ve aspired to similar support for the WorldCat APIs as well, and it was especially helpful to me in this instance.