As one of the developers of the WorldCat Search API, I’ve been a fan of APIs and webservices for quite a while now. And of course I’m a big fan of WorldCat.org and its links, both inbound and outbound. So I was glad to be involved in a recent small project to supplement existing Open Library records with OCLC numbers, using their API. Some Open Library records already had OCLC numbers, but many did not. More OCLC numbers would make connections from Open Library records to WorldCat.org, to find physical copies, more reliable. And for those with an OCLC number in hand, it would be great to be able to find matches with it in Open Library. OCLC and Open Library agreed to test the addition of OCLC numbers for records with ISBNs using the Open Library API, and I learned quite a bit along the way.
As noted in George Oates’ recent post on the Open Library blog, the initial plan was to use the Open Library API directly, traversing a list of ISBNs and corresponding OCLC numbers to find ISBN matches and, where needed, update the Open Library record. We found that this process would work, but would take far too long to complete. I think my original estimate was something like 8 years. So we worked on a hybrid approach. I preprocessed the OCLC list and a download of Open Library records to identify just the records that would be updated. The Open Library API has a nifty batch facility that helped the oclcBot march through a list of close to 4 million attempted updates, a thousand records at a time. Start to finish, with pauses between updates to be nice to the API, the updating process took about a week.
Most of my experience with APIs has been in a read-only mode, so it was good to have a chance to see a real-life example of a writeable API for Open Library data. The API and oclcBot are written in Python … though sometimes it seems I’ve written at least a little code in just about every language, Python was new to me, and an adventure. A little thing like whitespace indentation was enough to cause a newbie like me to stumble. But I was greatly helped by access to other Open Library bot code examples along the way, and consultation help from the Open Library developers. We’ve aspired to similar support for the WorldCat APIs as well, and it was especially helpful to me in this instance.
The OCLC Developer Network supports the use of OCLC Web Services—a set of tools and APIs that expose data and services for WorldCat and our member libraries and partner institutions or companies. learn more »
© 2010 OCLC Domestic and international trademarks and/or service marks of OCLC Online Computer Library Center, Inc. and its affiliates
Comments
Very pleased to see OCLC
Very pleased to see OCLC willing to enhance third party databases with OCLCnums. OCLCnums are very useful identifiers, which become more useful for us all the more places they are used.
Of course, the hard part is OCLCnum matching for records that don't already have matching ISBNs in WorldCat! That's very difficult/expensive to do, which is precisely why it would be valuable. I write software that tries to match records in our local catalog to records in external databases like OpenLibrary and WorldCat -- if the record has an ISBN already (in both ends of the attempted match), it already works even without the OCLCnum, adding the OCLCnum to OL is of limited additional value, sadly. Adding the OCLCnum for all those records without ISBNs, now that would be keen.