FAST as a knowledge base for automatic classification

This activity is now closed. The information on this page is provided for historical purposes only.

Goal

Evaluate FAST as a database to support automatic classification.

Description

FAST is based on LCSH and is designed for ease of use in the online environment. This project will assess the suitability of FAST authority records as a knowledge base for automatic classification, employing techniques that were developed for creating, testing, and evaluating Scorpion databases derived from the Dewey Decimal Classification and the Library of Congress Classification.

A successful outcome of this investigation would be a knowledge base for automatic classification derived from a standard that is already in widespread use in digital libraries, which is also scalable, publicly accessible, and compatible with Open Source software.

Research methodology

The approach to this project will include system building and empirical evaluation.

Why OCLC is conducting this research and how it helps libraries

This project rates high on four OCLC Research project-selection criteria because it has the potential to:

  1. enhance existing assets
  2. promote collaboration and consensus
  3. leverage the value of the cooperative
  4. contribute to scholarship.

Anticipated deliverables

The main outcome anticipated for this project is the creation of automated systems of use to the library community. More specific results include:

  1. publicly accessible Scorpion databases
  2. a research paper
  3. algorithms and software for recovering concept hierarchies
  4. algorithms and software for mapping terms to records from an external source.

Schedule

  • March 2003
    • Create baseline Scorpion databases.
    • Collect baseline statistics.
    • Distribute final draft of white paper for internal review.
    • Verify self-evaluation software.
    • Code algorithms for mapping terms to records.
  • April/May 2003
    • Test algorithms for term mapping and concept hierarchy extraction.
    • Conduct self-tests with enhanced Scorpion databases.
    • Define criteria for an evaluation that is not based on self-tests.
    • Identify a set of test records.
  • June 2003
    • Code algorithms for recovering concept hierarchies and applying them to concept records.
    • Finish March/April tasks.
    • Conduct an evaluation with the test records.
  • July 2003
    • Write up technical report.

Risks/Assumptions

  1. Hierarchies in FAST records may have internal inconsistencies or may not be easily recovered.
  2. Methods for creating subsets of FAST records may produce ad-hoc results.
  3. Automatic classification requires editorial support.

References

Project team

  • Jean Godby, Consulting Research Scientist (Lead)
  • Jay Stuler, Technical Intern

We are a worldwide library cooperative, owned, governed and sustained by members since 1967. Our public purpose is a statement of commitment to each other—that we will work together to improve access to the information held in libraries around the globe, and find ways to reduce costs for libraries through collaboration.