Please note: This experimental research project has concluded.
The research prototype application is no longer supported or maintained by OCLC services, and information on this page is provided for historical purposes only. Some portion of this content may be out-of-date and include broken links. Please visit the OCLC Research website to learn more about our current research.


The Scorpion Open Source project offers software that implements a system for automatically classifying Web-accessible text documents. Scorpion is intended for use by investigators who have a machine-readable subject classification scheme or thesaurus and wish to incorporate it into an automatic classification system.

The following pages have many links to articles that describe the development and evaluation of OCLC's Scorpion project.


This software may be used without charge in accord with the terms of the OCLC Research Public License. A PDF version of the license also is available. (PDF:130K/3pp.)

As of 2006 we are issuing software under the Apache License, Version 2.0.

If you would like to use this software under the Apache license, please contact us and we may be able to update the software to use the Apache license.


You may download the complete Scorpion code without using CVS for use or evaluation. This download is Release 1.1 of the software.

Scorpion is an application of Pears and Gwen. For a complete installation, both of them must be installed. In addition, the Dbutils support classes must be installed. The CVS repository for Scorpion contains the software and documentation required for designing Scorpion databases, custom-handling the results of a database search, and implementing a Web demo.

View: Readme Documentation    
Download: Scorpion-1.0.tar.gz