Scorpion documentation

Data flow overview

The name of an input document is specified on the command line using Java's system properties. The contents of this document are queried against a Pears database. The Pears library code returns a list of records that matched the query. The Gwen routines then rank the results according to a particular ranking scheme. The Scorpion code passes the ranked record set to a handler class that is specified in its initialization file. The handler class returns a String object, the contents of which are written to another command line specified file.

Scorpion dependencies

  • The Pears Open Source Package. This code builds and indexes a Pears database.
  • The Gwen Search Engine, which retrieves and ranks records from a Pears database.
  • The Dbutils package, which offers utilities to support database programing.

Getting the sample application working

  1. Make sure the following scripts' sh-bang lines point to bash on your system. Executing 'which bash' will tell you where it is. The defaults are shown in parentheses.
    scorpion/setup.sh (#!/bin/bash)
    scorpion/PDB/LCC/test.sh (#!/bin/bash)
    scorpion/PDB/LCC/buildPDB.sh (#!/bin/bash)
    scorpion/PDB/LCC/correlatePDB.sh (#!/bin/bash)
    scorpion/PDB/LCC/makeScorpionPDB.sh (#!/bin/bash)
  2. Run setup.sh. This script changes some pathnames in the configuration files to be correct for where you've installed Scorpion. It also creates a file with some common shell variables set. This file will be used by some of the other scripts.
  3. Either copy pears.jar, gwen.jar and Dbutils.jar to the scorpion/lib directory, or create links to those jars.
  4. cd into the scorpion/PDB/LCC directory. Run './makeScorpionPDB.sh lccSample'. This is a fairly CPU intensive program, so you may want to shut down other large applications first.
  5. Run test.sh to test the demo database. It will classify the file scorpion/demo/scorpion.input. An HTML fragment with the results of the classification will be placed in scorpion/demo/scorpion.output.html

Making Scorpion work with your database

  1. Design and create your database as an SGML file
  2. Transform the SGML file into a Pears database
  3. Create a Pears initialization file and a Gwen properties file.
  4. Write a Java class implementing the ORG.oclc.scorpion.RecordSetHandler interface
  5. Create a Scorpion initialization file
  6. If you want a web interface, modify the Perl script to meet your needs

Scorpion

Database

  • Designing
  • Building

Record handler

  • Writing
  • Compiling