CONTENTdm Linked Data Pilot

Introduction

Digital collections in libraries represent unique materials that illuminate our understanding of the world’s cultures, histories, and innovations. Traditional models of item description have rendered these materials largely invisible on the internet, and thus, hidden from researchers. Collaborating with library partners, OCLC is working to shift this paradigm to connect people with the unique digital resources that libraries hold.  

Courtesy of Cleveland Public Library

Courtesy of the Huntington Library, Art Museum, and Botanical Gardens

Courtesy of Minnesota Digital Library and Northfield Historical Society

In this project, OCLC is partnering with libraries to increase end-users’ ability to discover, evaluate, and use the unique digital resources in CONTENTdm repositories. The project is focused on developing the scalable methods and approaches needed to produce richer, state-of-the-art machine representations of entities and relationships to make visible connections that were formerly invisible. The application being developed will assist library staff to:

  • convert existing record-based metadata into linked data by replacing strings of characters with identifiers from known authority files and local library–defined vocabularies
  • manage the resulting entities and relationships
  • publish the graph of entities and relationships

Overview

CONTENTdm is a service for building, preserving and showcasing a library's unique digital collections.  

The large volume of unique, digital content stored in CONTENTdm offers an excellent opportunity for transition to linked data. Using linked data, the wide variety of data models and descriptive practices across CONTENTdm will be significantly easier to manage for library staff and will provide rich discovery for library end-users and web browsers. Immediate benefits of this project will be the ability to do structured searching across all CONTENTdm repositories and searching and faceting based on authority files and library-staff-defined vocabularies.

Project objectives include:

  • Increase end-users’ ability to discover, evaluate, and use unique digital content.
    • improved search and faceting features made possible by an entity-driven back end system
    • relevant contextual information drawn from relationships to web utilities like GeoNames and WikiData
  • Improve library staff efficiency with a significantly easier to manage descriptive environment.
    • library staff workflow tools for these new metadata descriptive practices—these tools are initially intended for users of CONTENTdm
    • cleaned-up and enhanced descriptive metadata for the digital collections contributed by the pilot participants
    • entity-based descriptions that have been reconciled to known authority services or to library-staff-defined vocabularies

The outcomes and findings from the Metadata Refinery project completed in 2016 and the linked data Wikibase prototype (Project Passage) completed in 2018 have provided great insight into how to implement a system to facilitate the mapping, reconciliation, storage, and retrieval of structured data for unique digital materials.

In addition to providing participants with the capability to clean up existing metadata, the system will have hooks to feed the structured data back into the pilot participant’s production CONTENTdm end-user website. As with Project Passage, this project will use Wikibase as the foundational, staff application aided by OpenRefine for data clean-up and reconciliation of character strings to authoritative identifiers. The project will also use IIIF APIs as a connector between structured descriptive data and the digital objects.

Three organizations—The Huntington Library, Art Museum, and Botanical Gardens; the Cleveland Public Library; and Minnesota Digital Library--joined the pilot in its initial phase. Temple University Libraries and University of Miami Libraries are participating beginning with the project’s second phase.

Project Phases

This project is happening in three phases and will run through August 2020. The first phase, completed in December 2019, involved importing existing record descriptions into a local Wikibase system. The project team focused on metadata cleanup and reconciliation with authorities. The output (enriched, entity-based descriptions) is informing the second phase. Phase two involves improving the search and discovery experience for digital materials and enhancing their context by bringing in the relationships with other data services. In the third phase, enrichment and reconciliation tools will be refined, and the project team will evaluate Wikibase as a platform on which to build CONTENTdm discovery interfaces.

Team

OCLC Membership & Research
Eric Childress
Jeff Mixter
Mercy Procaccini
Bruce Washburn

Global Product Management & Technology
Hanning Chen
Dave Collins
Shane Huddleston

Linked Data and IIIF Related Outputs

Publications


    Archives and Special Collections Linked Data: Navigating between Notes and Nodes

    21 July 2020

    OCLC Research Archives and Special Collections Linked Data Review Group

    This publication shares the findings from the Archives and Special Collections Linked Data Review Group, which explored key areas of concern and opportunities for archives and special collections in transitioning to a linked data environment.

    Utilisation des données liées dans les bibliothèques : de la désillusion à la productivité

    9 July 2020

    Andrew Pace

    OCLC has been researching the use of linked data within libraries for more than a decade. It is sometimes difficult to know exactly where the value of linked data lies and what benefits we can derive from it. It is wise, therefore, to consider their usefulness from the point of view of library staff. What does "linked data productivity" mean? What would cataloging linked data change for library staff and end users? This article responds to these questions and provides some perspective on the linked data landscape for libraries. 

    Exploring Models for Shared Identity Management at a Global Scale: The Work of the PCC Task Group on Identity Management in NACO

    9 December 2019

    Erin Stalberg, John Riemer, Andrew MacEwan, Jennifer A. Liss, Violeta Ilik, Stephen Hearn, Jean Godby, Paul Frank, Michelle Durocher, Amber Billey

    This paper discusses the efforts of the PCC Task Group on Identity Management in NACO to explore and advance identity management activities.

    Creating Library Linked Data with Wikibase: Lessons Learned from Project Passage

    5 August 2019

    Jean Godby, Karen Smith-Yoshimura, Bruce Washburn, Kalan Knudson Davis, Karen Detling, Christine Fernsebner Eslao, Steven Folsom, Xiaoli Li, Marc McGee, Karen Miller, Honor Moody, Craig Thomas, Holly Tomren

    “Project Passage” is an OCLC Research Wikibase prototype that explores using linked data in library cataloging workflows. The report overviews the prototype’s development, its adaptation for library use, and eight librarians’ experiences with the editing interface to create metadata for resources.

    WikiCite 2018-2019: Citations for the sum of all human knowledge

    17 July 2019

    Phoebe Ayers, Daniel Mietchen, Jake Orlowitz, Merrilee Proffitt, Sarah Rodlund, Elizabeth Seiver, Dario Taraborelli, Ben Vershbow

    WikiCite is an initiative uniting the Wikidata, linked data, and library communities to create an open repository of bibliographic data. This WikiCite 2018 conference overview examines the future of open bibliographic data and the impact that WikiCite achieved over the past year.

View all OCLC Research Publications