FAST as a knowledge base for automatic classification
Evaluate FAST as a database to support automatic classification.
FAST is based on LCSH and is designed for ease of use in the online environment. This project will assess the suitability of FAST authority records as a knowledge base for automatic classification, employing techniques that were developed for creating, testing, and evaluating Scorpion databases derived from the Dewey Decimal Classification and the Library of Congress Classification.
A successful outcome of this investigation would be a knowledge base for automatic classification derived from a standard that is already in widespread use in digital libraries, which is also scalable, publicly accessible, and compatible with Open Source software.
The approach to this project will include system building and empirical evaluation.
This project rates high on four OCLC Research project-selection criteria because it has the potential to:
- enhance existing assets
- promote collaboration and consensus
- leverage the value of the cooperative
- contribute to scholarship.
The main outcome anticipated for this project is the creation of automated systems of use to the library community. More specific results include:
- publicly accessible Scorpion databases
- a research paper
- algorithms and software for recovering concept hierarchies
- algorithms and software for mapping terms to records from an external source.
- March 2003
- Create baseline Scorpion databases.
- Collect baseline statistics.
- Distribute final draft of white paper for internal review.
- Verify self-evaluation software.
- Code algorithms for mapping terms to records.
- April/May 2003
- Test algorithms for term mapping and concept hierarchy extraction.
- Conduct self-tests with enhanced Scorpion databases.
- Define criteria for an evaluation that is not based on self-tests.
- Identify a set of test records.
- June 2003
- Code algorithms for recovering concept hierarchies and applying them to concept records.
- Finish March/April tasks.
- Conduct an evaluation with the test records.
- July 2003
- Write up technical report.
- Hierarchies in FAST records may have internal inconsistencies or may not be easily recovered.
- Methods for creating subsets of FAST records may produce ad-hoc results.
- Automatic classification requires editorial support.
- Godby, Carol Jean, and Jay Stuler. 2001. "The Library of Congress Classification as a Knowledge Base for Automatic Classification." IFLA Preconference [presentation paper]. Accessible at: http://staff.oclc.org/~godby/auto_class/godby-ifla.html.
- O'Neill, Edward T., et al, 2001. "FAST: Faceted Application of Subject Terminology." IFLA Preconference [presentation paper]. Accessible at: http://www.oclc.org/research/projects/fast/dc-fast.doc (Word:63K/7pp.)
- Jean Godby, Consulting Research Scientist (Lead)
- Jay Stuler, Technical Intern