Skip to page content

Worldwide (English) Change

Identify the attributes lacking in biliographic metadata required to provide better user access to the collective collection

The work of OCLC Research staff in determining the extent to which unique records in WorldCat represent "last copies" rather than "unique cataloging" and data mining WorldCat for the FictionFinder and WorldCat Identities prototypes have uncovered categories of content variations. These variations impair the ability to match records representing the same title (manifestation) as well as the ability to aggregate titles representing the same work.  The ability to present users with an overview of available works from thousands of possible manifestations depends on having reliable bibliographic data elements present.

The Program for Cooperative Cataloging (PCC) has established "core record standards" that represent what the community judges are the optimal set of attributes required for libraries to support discovery, retrieval, and inventory management.  Yet even these required attributes may be missing or, if present, incorrectly encoded.  The data elements required to provide users access to published works in a Web 2.0 context may be lacking.  This project synthesizes the work already done that could lead to requirements for grid-service tools for correcting the most common errors and validating the elements most needed for discovery. The analysis will also provide evidence of the MARC fields that are least used and could be dropped without impairing access.

This analysis of WorldCat records will report:

  • occurrence and co-occurrence of data elements in records
  • occurrence of required and recommended fields or subfields from one or more widely-adopted standards (e.g., PCC core)
  • occurrence of elements needed for work-level presentation and the elements that are most often missing (FRBR)
  • categories of the most common content and encoding errors in critical fields by bibliographic format

The report will assess the utility of automated Web-based techniques to:

  • correct common content and encoding errors in critical fields by bibliographic format
  • enrich records with useful data (e.g.,. data known to be needed to support a specific use cases)

The report will also provide evidence-based recommendations that could significantly increase cataloging productivity.

A working group of RLG Programs Partners will be convened to review the preliminary report and assess which recommendations would have the most impact on their own operations.

For more information

Karen Smith-Yoshimura
Program Officer
smithyok@oclc.org