
Data Science & Metadata Research

To be discoverable by today’s online users, traditional library data must be transformed. OCLC Research analyzes bibliographic data to derive new meaning, insights, and services for use by library and information seekers. This work includes special projects in metadata enrichment, authorities & identities, linked data, subjects & classification, and data analysis.


    BolVis: Visualization for Text-based Research in Philosophy

    BolVis: Visualization for Text-based Research in Philosophy

    30 June 2018

    Pauline van Wierst, Steven Hofstede, Yvette Oortwijn, Thom Castermans, Rob Koopman, Shenghui Wang, Michel A. Westenberg, Arianna Betti

    BolVis is a visualization tool for text-based research in philosophy. BolVis helps researchers determine quickly which parts of a text corpus are most relevant by performing a semantic similarity search on words, sentences, and passages, enabling in-depth analysis of texts at a significantly greater scale.

    National Strategy for Shareable Local Name Authorities National Forum: a White Paper

    National Strategy for Shareable Local Name Authorities National Forum: a White Paper

    29 March 2018

    Michele Casalini, Chiat Naun Chew, Chad Cluff, Michelle Durocher, Steven Folsom, Paul Frank, Janifer Gatenby, Jean Godby, Jason Kovari, Nancy Lorimer, Clifford Lynch, Peter Murray, Jeremy Myntti, Anna Neatrour, Cory Nimer, Suzanne Pilsk, Daniel Pitti, Isabel Quintana, Jing Wang, Simeon Warner

    SLNA-NF outlines key challenges in facilitating the data sharing of local name authorities. This white paper considers ways to modernize the practice of library authority control to make authorities more shareable.

    Descriptive Metadata for Web Archiving: Literature Review of User Needs

    Descriptive Metadata for Web Archiving: Literature Review of User Needs

    7 February 2018

    Jessica Ventlet, Karen Stoll Farrell, Tammy Kim, Allison Jai O’Dell, Jackie Dooley

    The OCLC Research Library Partnership Web Archiving Metadata Working Group (WAM) has created recommendations for descriptive metadata best practices for archived web content to meet end-user needs, enhance discovery, and improve metadata consistency. This literature review considers both end user and metadata practitioner needs for web archives.

    Descriptive Metadata for Web Archiving: Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group

    Descriptive Metadata for Web Archiving: Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group

    7 February 2018

    Jackie Dooley, Kate Bowers

    The OCLC Research Library Partnership Web Archiving Metadata Working Group (WAM) developed recommended practices for creating consistent metadata that addresses the unique characteristics of websites and collections for web archiving. These practices can be used by anyone who needs to describe web content.

    Descriptive Metadata for Web Archiving: Review of Harvesting Tools

    Descriptive Metadata for Web Archiving: Review of Harvesting Tools

    7 February 2018

    Jackie Dooley, Mary Samouelian

    In this report, the OCLC Research Library Partnership Web Archiving Metadata Working Group (WAM) reviews 11 web harvesting tools to determine their descriptive metadata functionalities.

    Name Authority Control

    Name Authority Control

    13 November 2017

    Janifer Gatenby, Karen Smith-Yoshimura

    From Records to Things: Managing the Transition from Legacy Library Metadata to Linked Data

    From Records to Things: Managing the Transition from Legacy Library Metadata to Linked Data

    1 January 2017

    Carol Jean Godby, Karen Smith-Yoshimura

    To maximize the value of linked data using library content, important entities and relationships must be defined and made available, codings that are machine understandable must be adapted for linked data purposes, and persistent identifiers must be substituted for text. 

    RAMP – the Repository Analytics and Metrics Portal: A Prototype Web Service that Accurately Counts Item Downloads from Institutional Repositories

    RAMP – the Repository Analytics and Metrics Portal: A Prototype Web Service that Accurately Counts Item Downloads from Institutional Repositories

    1 January 2017

    Patrick Obrien, Kenning Arlitsch, Jeff Mixter, Jonathan Wheeler, Leila Belle Sterman

    Mining MARC's Hidden Treasures: Initial Investigations Into How Notes of the Past Might Shape Our Future

    Mining MARC's Hidden Treasures: Initial Investigations Into How Notes of the Past Might Shape Our Future

    16 December 2016

    Jay Weitz, Jenny Toves, Diane Vizine-goetz, Nannette Naught, Robert Bremer

    Finding, interpreting, and manipulating the rich trove of data already present in MARC bibliographic records to produce systematized forms is an invaluable step in moving MARC toward a post-MARC, Linked Data future. Name access points, especially those fields in a controlled form, are the obvious place to find relationship information, but bibliographic notes and statements of responsibility are relatively overlooked sources of that information, waiting to be parsed and used. The Online Computer Library Center has been investigating means by which to find names and their associated role phrases, to match those names to authorized forms, and to match role terms and phrases to controlled vocabularies.

    Undercounting File Downloads from Institutional Repositories

    Undercounting File Downloads from Institutional Repositories

    11 October 2016

    Patrick Obrien, Kenning Arlitsch, Leila Sterman, Jeff Mixter, Jonathan Wheeler, Susan Borda