Linked Data Wikibase Prototype

In 2017 and 2018, OCLC partnered with libraries on a prototype to demonstrate the value of linked data for improving resource-description workflows in libraries. Work is now complete on this initiative, and a report on the work will be published in 2019.

Working closely with colleagues in OCLC's Global Product Management and Global Technologies, the OCLC Research team launched a service built on the Wikibase platform to provide critical services:

A reconciliation service – to connect legacy bibliographic information to linked data entities

An editor service – to view, create and edit linked data descriptions and relationships

Creation of the ecosystem involved a two-step filtering process:

  1. Seeding the system with the Controlled Vocabulary identifiers that existed in the OCLC Research enhanced version of WorldCat
  2. Any WikiData Entity that contained any of those identified Controlled Vocabulary identifiers was pulled out and added to the prototype ecosystem

OCLC then partnered with 16 academic, research, public, and national libraries to prototype the reconciliation service – to connect legacy bibliographic information to linked data entities; and an editor service – to view, create, and edit linked data descriptions and relationships. 

Wikibase

To accelerate the availability of a feature-rich user interface, an out-of-the-box Wikibase instance was implemented for search, display, and APIs. One of the goals was to have library partners give OCLC feedback on what works in this environment and what doesn’t before building and designing something from scratch. Initial reaction to the Wikibase feature set was very positive.

Current Partners

  • American University
  • Brigham Young University
  • Cleveland Public Library
  • Cornell University Library
  • Harvard University
  • Michigan State University
  • National Library of Medicine
  • North Carolina State University
  • Northwestern University
  • Princeton University
  • Smithsonian Library
  • Temple University
  • UC Davis Library
  • University of Minnesota
  • University of New Hampshire
  • Yale University

This great group of partners and OCLC worked together to refine needs assessment for services. Partners provided feedback by commenting about their use of the prototype systems, responding to engagement activities, and through partner meetings. This collaboration built upon past efforts, such as the Person Lookup Pilot and the Metadata Refinery effort, to demonstrate the production value of linked data services. 

Partners tested the system, gave feedback, and attended regular partner feedback sessions set up by OCLC.

Methodology and Timeline

Partners were given access to a live prototype system. The features and functionality are fully documented and supported by a team of product managers, analysts, engineers, and architects. The goal of the project was to inform the Global Product Management roadmap for metadata applications and services.

November 2017: Project kickoff; Discuss partnership and services with Phase 1 Partners

December 2017: Gather use cases from Phase 1 Partners

January 2018: Reconcile strings to identifiers

February 2018: Launch entity editor

March 2018: Gather enhancements and provide SPARQL endpoint; add five to ten new library partners

April 2018: Discussions with libraries resulted in a total of 16 institutions participating in the project going forward

May 2018: Launch the experimental “Explorer” UI to view entities and their relationships to other items; launch the OpenRefine API; gather feedback on the creation of creative works and prioritization of enhancements from partners

Explorer view of Being and Time with multiple translations

June-July 2018: Implement top enhancements suggested by library partners

  1. Improve indexing 
  2. Use the Wikibase UI to search by a non-prototype identifier
  3. Include dates for disambiguation in autosuggest results
  4. Offer property-based constraints
  5. Provide gadget-based taxonomy navigation

August-September 2018: Explore additional top enhancements

  1. Provide a data import tool
  2. Include WorldCat data in the Explorer
  3. Offer an input form for descriptive data
  4. Batchload entities provided by partner libraries
  5. Document when reference sources are required for statements

Summary

The project achieved goals in three major areas.

  1. Collaboration: the team of OCLC staff and dozens of librarians from 16 institutions created use cases, created entities and made edits in the linked data ecosystem, used the OCLC Community Center to discuss workflows and ask questions, and participated in 28 monthly meetings and weekly “Office Hours” session.
  2. Reconciliation Services: experimented with cataloging workflows for entity reconciliation, using both a SPARQL endpoint and a user interfaced dubbed “The Explorer.
  3. Editing: managed entities in the native Wikibase user interface, the Explorer, and another experimental application, “The Retriever.” 

The simple prototype described at the beginning of the project matured overt time to a robust set of third-party tools and home-grown applications to manage over a million Wikidata entities. The evolution of the project to this more comprehensive set of tools and applications was driven by project participants’ new ideas, requested features, and feedback on applications and prototype use guidelines.

A recording of the final meeting with library partners is available online:

Presentations

Works in Progress Webinar: Lessons Learned from a Linked Data Prototype for Managing Bibliographic Data
Video | 30 October 2018
 
by Bruce Washburn, Stephen Hearn, Marc McGee, and John Chapman
This webinar highlights the latest developments with linked data and the future of bibliographic services utilizing Wikibase, FAST, VIAF, and Wikidata.

Final Partner Meeting
Presentation (.pptx) | 9-10 October 2018 
by Sara Newell, Taylor Surface, Jeff Mixter, and John Chapman 
Presented online to the Prototype partner libraries to wrap-up the project. View recording above.

Prototyping a Linked Data Platform for Production Cataloging Workflows
Presentation (.pptx) | 2 May 2018
by Bruce Washburn co-presenting with Carl Stahmer (UC Davis)
An OCLC linked data prototype using Wikibase, Wikidata, VIAF and WorldCat Works to experiment with non-MARC cataloging workflows
Presented at LD4 Workshop 2018, 2 May 2018, Palo Alto, CA

Linked Data Reality Check
Presentation (.pptx) | 18 April 2018
by Andrew K. Pace
A linked data primer and overview of the OCLC Wikibase and Wikidata prototype
Presented at Computers in Libraries, Arlington VA

Prototyping a Linked Data Platform for Production Cataloging Workflows
Presentation (.pptx) | 13 April 2018
by Andrew K. Pace with Jason Kavari (Cornell University)
An OCLC linked data prototype using Wikibase, Wikidata, VIAF and WorldCat Works to experiment with non-MARC cataloging workflows
Presented at the CNI Spring 2018 Membership Meeting,San Diego, CA

 

Team Lead

Andrew K. Pace
Executive Director, Technical Research
For more information, email pacea@oclc.org.

Project Team

John Chapman

Eric Childress

Jean Godby

Melissa Hess

Marti Heyman

Tod Matola

Jeff Mixter

Sara Newell

Stephan Schindehette

Taylor Surface

Diane Vizine-Goetz

Bruce Washburn

Jeff Young