Linked Data

Linked Data is about communities agreeing on the meaning of their data and sharing it in a massively networked information space. This vision is taking shape in many sectors, including e-commerce, medicine, scientific research, and government services. OCLC Research is a leader in driving this transformation in the library community.

Linked Data Overview

Introduction

Linked data is about communities agreeing on the meaning of their data and sharing it in a massively networked information space. This vision is taking shape in many sectors, including e-commerce, medicine, scientific research, and government services. OCLC Research is a leader in driving this transformation in the library community.

Among librarians, linked data research is an exploration of how our most important data can be re-engineered from databases of records comprehensible only to libraries to more broadly understandable collections of facts about entities, or culturally important people, places, organizations, events, and concepts with names such as Albert EinsteinParisUniversiteit van AmsterdamWorld War II, and black holes. In this form, our data can be linked with that of other professions—boosting the visibility and relevance of libraries, while conferring the library's authority on the work of others.

Linked data is a major research activity at OCLC. It is also a component in the OCLC enterprise WorldCat data strategy, which reinforces an important conclusion from our research: that when the library community's resource descriptions are represented on the web as "entities" with interconnected relationships, the data can be more easily incorporated into websites and online tools than traditionally formatted bibliographic data. Though we recognize that MARC and other library standards will be part of the library community's data landscape for the foreseeable future, we believe that linked data will eventually become the de-facto standard.

OCLC Research and Linked Data

"Why Linked Data'" by OCLC Research, CC BY 4.0

OCLC researchers have been experimenting with linked data since its inception in the early 2000s. In fact, our engagement reaches back to the 1990s because of the seminal contributions of former colleagues Stuart Weibel and Eric Miller, who led efforts to develop foundational standards of the linked data paradigm.

Our monograph Library Linked Data in the Cloud: OCLC's Experiments with Next-Generation Resource Description, published in 2015, recounts some of this history. Initial activity contributed to the global argument that librarians should adopt web-friendly standards. As linked data, library data had to be re-imagined as collections of Things, not Strings. And a technical proof-of-concept for the publication of linked data at a commercial scale required an assessment of the maturity of RDF and associated technologies for storage and delivery.

The most tangible outcome of this early work was the publication of over 20 billion RDF triples from some of the most familiar and widely used data resources in the library community: WorldCat, VIAF, FAST, ISNI, and the Dewey Decimal Classification. The Library of Congress and many national libraries in Europe and the Pacific Rim also published high-profile RDF datasets during the same period. But no other library or library services organization has contributed or shared more linked data than OCLC.

Since 2015, we have been working to build pathways from research to production. Check out our current research and our participation in community discussion and standards initiatives. The underlying goal of this work is to foster wider involvement in the modernization of the library community's data infrastructure, which was established in the 1960s and is now ripe for an upgrade.

The Big Picture

"Changing Resource Description Workflows" by OCLC Research, CC BY 4.0

OCLC's linked data research is being pursued in the context of a generational and global evolution in the design of the library community's data infrastructure. Seminal reports such as On the Record, published by the Library of Congress in 1996, identified the drivers. Key among them is the realization that the internet, not the library, is where the quest for information now typically begins. To be more effective in this new world, libraries must be more visible on the internet. Library data must be published to web-friendly standards such as ontologies that can be understood outside our community, and in formats such as RDF and JSON that can be consumed by third-party systems. These are among the goals that linked data conventions aim to accomplish.

The OCLC Research Linked Data program keeps the big picture in view. The outcomes of our work include publications, presentations, webinars, datasets, and software demos. 

Explore More Linked Data Research

Getting Started

New to linked data? No problem. We've pulled together some resources to help you get started with linked data. Check them out:

Get Started

Join the Linked Data Conversation

Find OCLC Research's latest Linked Data ideas, thoughts, data, and projects on our blog Hanging Together.

Read Linked Data Posts

Linked Data Team

Team Lead

Jean Godby

Team Members

Ralph LeVan
Jeff Mixter
Stephan Schindehette
Karen Smith-Yoshimura
Roy Tennant
Shenghui Wang
Bruce Washburn
Jeff Young

OCLC and Linked Data Strategic Approach

Download the PDF