Please note: This experimental research project has concluded.
The research prototype application is no longer supported or maintained by OCLC services, and information on this page is provided for historical purposes only. Some portion of this content may be out-of-date and include broken links. Please visit the OCLC Research website to learn more about our current research.


"Amazing! Simply, bloody amazing!!" –Art Rhyno, University of Windsor

These Python scripts are a demonstration of how short a compliant OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) can be. The following files are available:

  • The harvester is a one-page OAI client that can download metadata from an OAI repository and put it into a file suitable for the repository.
  • The repository is a two-page OAI server that can read in an XML file, such as that written by the harvester above, and make them harvestable in turn.
  • There is a short readme file explaining how to use the harvester and respository programs.


These programs have been tested with Python 2.2.2 and 2.2.3, but should work with any 2.2 or later Python. They are completely self-contained, using standard Python XML and HTTP libraries. No additional libraries beyond those included in the standard Python distribution are needed. The Web server used by the repsository is 'built-in'.

The primary limitations are that only oai_dc records are supported and that the whole database gets parsed when initialized. The header information for each record is kept in a list, each one occupying approximately 1K bytes. Reading the database both takes some time and precludes dynamic changes to the records. As far as we know the code is completely compliant to the OAI-PMH (approximately 19 out of the program's 106 lines are dedicated to catching OAI error conditions). Resumption tokens are stateless.

Some of the coding methods used in these programs might qualify as 'clever programming,' something generally avoided in Python as a matter of style. At any rate, these programs are not presented as exemplars of good Python style -- a number of tricks are used to keep them short. As formatted here, the repository prints in two pages from my XEmacs without any lines wrapping, other than the long XML and copyright strings at the end of the program. In defense of this style, we have found that debugging a two page program is very easy.


This software may be used without charge in accord with the terms of the OCLC Research Public License. A PDF version of the license also is available. (PDF:130K/3pp.)

As of 2006 we are issuing software under the Apache License, Version 2.0.

If you would like to use this software under the Apache license, please contact us and we may be able to update the software to use the Apache license.


View: Readme    
Download: Distribution Classes  

Most recent updates: Page content: 2009-08-11 Prototype: 2009-08-11


Thom Hickey