Building a Pears database
The Scorpion demo requires a Pears database that contains your subject classification scheme. For illustration, the Scorpion Open Source package includes our experimental adaptation of a fragment of the Library of Congress Classification, which is found in the file lccSample.sgml.
To create a Scorpion Pears database, you need to perform the following steps:
- Create a file containing your subject classification scheme, following the suggestions in Scorpion database design. This file must be encoded in syntactically correct SGML with explicit "end" tags and is required to have the extension .sgml.
- The files file.tags and filedesc.ini contain the essential information for translating the SGML data in the input file to the BER encoding required by Pears. File.tags translates each SGML tag into a BER fldid, and there must be a fldid for each SGML tag. This information is used in filedesc.ini to create database indexes. For example, in the [BasicIndex] section of filedesc.ini, the statement tagpath*=25/1 creates a BER-encoded index from the
tag because file.tags specifies a correspondence between ScorpionCaption and the fldid 25. The line tagpath*=26/11/1 creates an index from each of the terms in the tags. This tagpath has an additional level because, in lccSample.sgml, each term in is enclosed in SGML tags, which have been assigned the value 11 in file.tags.
The stopword list in filedesc.ini can be modified to fit the needs of your application. Learn more about filedesc.ini, which is a sample Pears database description file.
- Copy the files buildPDB.sh, correlatePDB.sh and makeScorpionPDB.sh from PDB/LCC into the directory where your SGML file, filedesc.ini and file.tags are located. If you created the test database and ran the test script, you can also copy the file PDB/LCC/sh.conf into your build directory. If not, you'll have to create that file and set the following variables:
- full path to the java binary
- full path to the scorpion installation directory
- full path to the directory where pears.jar and Dbutils.jar are located. (Should be $BASE/lib)
- Run the script makeScorpionPDB.sh
, where filename-prefx is the name of the file created in Step 2, minus the .sgml extension. The result of this process is a Pears database, which has the name filename-prefix.pdb.