Data Science

The internet is the native environment of information seekers. OCLC Research recognizes that to be integrated into the internet, traditional library data must be transformed in various ways. We are analyzing the data in WorldCat and other sources to derive new meaning, insights, and services for use by libraries and others on the internet. Our work includes:

Publications

    Archives and Special Collections Linked Data: Navigating between Notes and Nodes

    21 July 2020

    OCLC Research Archives and Special Collections Linked Data Review Group

    This publication shares the findings from the Archives and Special Collections Linked Data Review Group, which explored key areas of concern and opportunities for archives and special collections in transitioning to a linked data environment.

    Utilisation des données liées dans les bibliothèques : de la désillusion à la productivité

    9 July 2020

    Andrew Pace

    OCLC has been researching the use of linked data within libraries for more than a decade. It is sometimes difficult to know exactly where the value of linked data lies and what benefits we can derive from it. It is wise, therefore, to consider their usefulness from the point of view of library staff. What does "linked data productivity" mean? What would cataloging linked data change for library staff and end users? This article responds to these questions and provides some perspective on the linked data landscape for libraries. 

    Exploring Models for Shared Identity Management at a Global Scale: The Work of the PCC Task Group on Identity Management in NACO

    9 December 2019

    Erin Stalberg, John Riemer, Andrew MacEwan, Jennifer A. Liss, Violeta Ilik, Stephen Hearn, Jean Godby, Paul Frank, Michelle Durocher, Amber Billey

    This paper discusses the efforts of the PCC Task Group on Identity Management in NACO to explore and advance identity management activities.

    Responsible Operations: Data Science, Machine Learning, and AI in Libraries

    8 December 2019

    Thomas Padilla

    Responsible Operations is intended to help chart library community engagement with data science, machine learning, and artificial intelligence (AI) and was developed in partnership with an advisory group and a landscape group comprised of more than 70 librarians and professionals from universities, libraries, museums, archives, and other organizations.

    Creating Library Linked Data with Wikibase: Lessons Learned from Project Passage

    5 August 2019

    Jean Godby, Karen Smith-Yoshimura, Bruce Washburn, Kalan Knudson Davis, Karen Detling, Christine Fernsebner Eslao, Steven Folsom, Xiaoli Li, Marc McGee, Karen Miller, Honor Moody, Craig Thomas, Holly Tomren

    “Project Passage” is an OCLC Research Wikibase prototype that explores using linked data in library cataloging workflows. The report overviews the prototype’s development, its adaptation for library use, and eight librarians’ experiences with the editing interface to create metadata for resources.

    Analysis of 2018 International Linked Data Survey for Implementers

    8 November 2018

    Karen Smith-Yoshimura

    Using the 2018 International Linked Data Survey results, this article overviews the linked data projects or services implemented by institutions, what data they publish or consume, why they implemented linked data, challenges faced, and advice for institutions considering a linked data project or service.

    A Philosophical Perspective on Visualization for Digital Humanities

    21 October 2018

    Hein van den Berg, Arianna Betti, Thom Castermans, Rob Koopman, Bettina Speckmann, Kevin Verbeek, Titia van der Werf, Shenghui Wang, Michel A. Westenberg

    CatVis is an interdisciplinary digital humanities project that provides resources for librarians to manage vast bibliographic records as well as visualization tools for philosophical research. This paper describes the challenges encountered during the interdisciplinary research project CatVis.

    SolarView: Low Distortion Radial Embeddings with a Focus

    13 August 2018

    Thom Castermans, Kevin Verbeek, Bettina Speckmann, Michel A. Westenberg, Rob Koopman, Shenghui Wang, Hein van den Berg, Arianna Betti

    This research proposes a novel type of low distortion radial embedding that preserves near-exact distances to the focus entity and minimizes distortion between other entities. This data visualization method adapts SolarView to explore high-dimensional metric space of bibliographic entity similarities.

    BolVis: Visualization for Text-based Research in Philosophy

    30 June 2018

    Pauline van Wierst, Steven Hofstede, Yvette Oortwijn, Thom Castermans, Rob Koopman, Shenghui Wang, Michel A. Westenberg, Arianna Betti

    BolVis is a visualization tool for text-based research in philosophy. BolVis helps researchers determine quickly which parts of a text corpus are most relevant by performing a semantic similarity search on words, sentences, and passages, enabling in-depth analysis of texts at a significantly greater scale.

    National Strategy for Shareable Local Name Authorities National Forum: a White Paper

    29 March 2018

    Michele Casalini, Chiat Naun Chew, Chad Cluff, Michelle Durocher, Steven Folsom, Paul Frank, Janifer Gatenby, Jean Godby, Jason Kovari, Nancy Lorimer, Clifford Lynch, Peter Murray, Jeremy Myntti, Anna Neatrour, Cory Nimer, Suzanne Pilsk, Daniel Pitti, Isabel Quintana, Jing Wang, Simeon Warner

    SLNA-NF outlines key challenges in facilitating the data sharing of local name authorities. This white paper considers ways to modernize the practice of library authority control to make authorities more shareable.