Metadata Management

OCLC Research has over thirty years of leadership in metadata management, organized around the classic library-science categories of bibliographic description, name authorities, classification, and controlled subject vocabularies. Outcomes include the first drafts of Dublin Core, the Resource Description Framework (RDF), the International Standard Name Identifier (ISNI), the Faceted Application of Subject Headings (FAST), the Virtual International Authority File (VIAF), as well the first machine-processable versions of the Dewey Decimal Classification. OCLC researchers have also worked with thought leaders in the library community to shape the development and application of many other important standards, including MARC, the Functional Requirements for Bibliographic Records (FRBR), Resource Description and Access (RDA), and ONIX. This work has led to many improvements and demonstrations of value, as represented in research prototypes and described in numerous presentations, research articles, reports, white papers, surveys, patents, and books.

Given this rich history, OCLC Research is in an excellent position to continue its leadership role as the library community strives to convert its collective investment in metadata into a form that is more Web-friendly, more easily processed, and more readily consumed by other communities with a need to access the world's knowledge and answer questions about how it came into being.

This is an ambitious goal and it generates a long list of tasks. OCLC Researcher staff are assessing what existing metadata descriptions "say" and whether they can be algorithmically processed. We are also importing the best ideas from previous successes into state-of-the-art Semantic Web standards, proposing improvements where necessary. A longer-term goal is to devise methods for extracting knowledge from metadata that has relatively little explicit structure or reveals new insights only when it is aggregated or mashed up. As this work progresses, we need to define formal data models and create data sets that conform to them; to enhance the descriptions, especially for non-English languages and non-Latin character sets; to validate and check for inconsistencies; and to demonstrate the use of the new forms of metadata in applications—collaborating, where possible, with non-library partners such as publishers, distributors and general-purpose search engines.

How we advance thinking

Understanding what has been said. Prior to implementing major changes to the design and delivery of metadata, we need to find out how existing standards are being used to create descriptions. For example, the library community has used the MARC standard for many decades, but how, exactly? Which elements and subfields have actually been utilized? It is expected that this work can help inform deliberations at the Library of Congress and in the profession at large regarding the future of our bibliographic infrastructure.

 

Developing new standards and data models. This thread of work addresses the main goal defined in the Library of Congress Bibliographic Framework Initiative: to reconfigure aging library metadata standards using Semantic Web protocols and data models. Results include linked-data versions of FAST, VIAF, and the Dewey Decimal Classification, as well as the initial release of WorldCat with linked-data markup. Current work is focused on defining a post-MARC standard that expresses the true richness of the library community's legacy metadata.

Building new aggregations; developing new applications. This research demonstrates new ways to organize, visualize, and facilitate access to resources managed by libraries, placing them in the context of the broader Web. The outcomes also advance thinking about new models for metadata management by elevating concepts such as Work, Creator, Publisher, Genre, and Subject, which transcend MARC and other legacy standards and are important in Semantic Web representations of the library information space.

  • Example prototypes:
    • Classify
    • FictionFinder
    • Kindred Works
    • mapFAST
    • Publisher Name Server
    • Terminology Services
    • VIAF
    • WorldCat Genres
    • WorldCat Identities
    • WorldCat Identities Network

 

Unlocking knowledge. These projects acknowledge the inconvenient fact that existing library metadata contains much valuable information in free text, full text, or semi-structured text that is noisy and not easily processed algorithmically. They share the goal of extracting this information and representing it in Semantic Web-compatible data models.

 

Engaging the community. We develop new metadata models by enlisting the expertise of the broader community to identify and describe culturally significant resources.

 

Managing metadata flow. Since library metadata is semi-structured, traditional methods for processing it are only semi-automated. We develop research prototypes that push automation to the next level by algorithmically performing tasks that previously required expert human guidance.

 

Related work

Library Linked Data
The Library Linked Data projects aim to create next-generation bibliographic descriptions using Semantic Web technology. OCLC researchers have recently created Linked data versions of FAST and the full set of the twenty-third edition of the Dewey Decimal Classification. They also took the first step toward a linked data representation of bibliographic data by releasing Schema.org markup for WorldCat, producing the largest collection of linked bibliographic data in the world. Now that the complete publicly available version of WorldCat is available for use by intelligent Web crawlers, search engines such as Google and Bing can make use of this metadata in search indexes and other applications, raising the visibility of library resources in the open Web.

OCLC Research on YouTube

Watch OCLC Research YouTube Channel videos that feature some of our current work or recent findings