Data Science & Metadata Research

To be discoverable by today’s online users, traditional library data must be transformed. OCLC Research analyzes bibliographic data to derive new meaning, insights, and services for use by library and information seekers. This work includes special projects, data science research, engagement with metadata communities, publications and presentations, and the creation of illustrative experimental applications.

Presentations

Case Studies of US Research Information Management

Case Studies of US Research Information Management

By Rebecca Bryant

VIVO 2021 International Conference
virtual

This session shares findings from a forthcoming OCLC Research report on Research Information Management Practices in the United States (http://oc.lc/us-rim-project), scheduled for early fall 2021. The report collects evidence from in-depth case studies of RIM practices at five US research universities: Penn State University, Texas A&M University, Virginia Tech, UCLA, and University of Miami. The case studies represent open source, proprietary, and home grown RIM solutions at the five institutions and highlight the proliferation of use cases such as public portals, faculty activity reporting, and strategic reporting.

By synthesizing information from the five case studies, we offer a comprehensive definition of Research Information Management and also document the multiple use cases that proliferate in decentralized US research universities. We will also offer a new RIM System Framework, which describes the required and optional functional and technical elements that comprise the architecture of US RIM systems, regardless of use case. We believe that this framework will help demystify RIM infrastructure and also help practitioners better understand the array of campus stakeholders required for successful RIM implementation.

This research is based upon interviews with 39 participants engaged in RIM activities at the five case study institutions and builds upon the significant body of work on RIM practices already produced by OCLC Research (oc.lc/rim). We believe this research is of considerable utility to the university community, offering a more comprehensive and strategic view of RIM practices, along with recommendations for institutions. We will conclude the presentation by demonstrating the value of the case studies and framework through examples pulled from the report’s case studies.

Topics: Research Information Management

Bringing IIIF Manifests to Life in Wikidata with Mirador 3 - 2021 IIIF Annual Conference

Bringing IIIF Manifests to Life in Wikidata with Mirador 3 - 2021 IIIF Annual Conference

By Jeff Mixter, Gina Solares

IIIF Annual Conference
virtual

Wikidata is an open knowledge base of structured data that describes any type of entity, including people, organizations, concepts, events, places, and works. Some works described in Wikidata now include a IIIF Presentation Manifest URL. In Wikidata’s default user interface, that URL appears as a link to the Manifest JSON. But Wikidata can be customized to alter the user interface and add new features.

In this presentation we will discuss and demonstrate a Wikidata user script that, for items that include a IIIF Presentation Manifest URL, will embed the ProjectMirador viewer and load the Manifest JSON so that the images referenced in the Manifest can be viewed in the context of other Wikidata statements about the work. 

The discussion will cover how the user script embeds the Mirador3 viewer within a Wikidata item page and how it detects that the viewer should be added. We will also illustrate how one library is including IIIF manifests in Wikidata, with a conversation about learnings from that work, and about how the user script has contributed to the library's understanding of IIIF metadata and Wikidata. The demonstration will show how Wikidata user scripts are created and shared and look at ways in which Wikidata queries can uncover IIIF manifests.

Topics: Wikimedia, IIIF

Open for All, Reusable for Whom?: A Review of What Data Reusers Want and How Data Repositories Can Deliver.

Open for All, Reusable for Whom?: A Review of What Data Reusers Want and How Data Repositories Can Deliver

By Ixchel M. Faniel, Lisa Johnston, Katie Wissel

Open Repositories 2021
virtual

Understanding how data reusers seek and evaluate potential data for reuse will aid data curators, data managers, and developers in the open repository field. We will review past studies of data reusers, specifically a qualitative study of 105 researchers from three disciplinary communities: quantitative social science, archaeology, and zoology. The study identified 12 types of context information that data reusers mention needing when deciding whether to reuse data. Next, we will use the context types to create a feature set and assess how data repositories provide the needed context information to users. Finally, using findings from our assessment, we will showcase desirable features in use to prototype the design of a reuser-oriented data repository that developers can use to improve their data repository interface.

Topics: Open Access, Research Data Management

OCLC Linked Data: Research, experimental applications, and shared infrastructure

OCLC Linked Data: Research, experimental applications, and shared infrastructure

By Andrew Pace, John Chapman

LD4 Conference 2020
virtual

This presentation summarized OCLC's findings on the impact of new workflows in the ground-shifting transition from traditional cataloging to linked data platforms, highlighted the integral engagement, participation, and feedback from OCLC members, and attempted to chart a linked data research path for the decade to come.

Recording available on LD4 on YouTube.

Topics: Linked Data

What are the entities that matter, and  how much should we say about them?

What are the entities that matter, and how much should we say about them?

By Jean Godby

NISO Webinar: Implementing Library Linked Data
Virtual

This presentation discusses the work of catalogers who participated in OCLC's Project Passage in 2018. It develops the theme of identification of "the entities that matter" and concludes with a brief update on OCLC's post-Passage activities involving resource description in Wikibase.

 

Topics: Linked Data, Wikimedia

How IIIF standards improve search and discovery for Cultural Heritage collections

How IIIF standards improve search and discovery for Cultural Heritage collections

By Jeff Mixter

DLF Forum
Tampa, Florida, USA

IIIF is an emerging standard for sharing digital structural metadata. OCLC is an active member of the IIIF community and has been working to integrate the standard in is services/products. This talk discusses the experimental IIIF work being done by OCLC Research to help test evolving IIIF standards and help integrate them into production services.

Topics: IIIF, Linked Data

Introducing the CONTENTdm Linked Data Pilot Project

Introducing the CONTENTdm Linked Data Pilot Project

By Jeff Mixter, Bruce Washburn

CONTENTdm User Group Meeting
Indianapolis, IN, USA

The CONTENTdm Linked Data pilot explores how to convert CONTENTdm data into linked data, how to curate the data in the Wikibase infrastructure, and how to use the data to improve end-user experiences in CONTENTdm. This presentation covers the background research that led to the development of the pilot, the plans for the 3 phases of the pilot, and some early feedback from one of the pilot participants.

 

Topics: Linked Data, IIIF

IIIF Change Discovery in Action: Findings from an OCLC Research Experiment

IIIF Change Discovery in Action: Findings from an OCLC Research Experiment

By Jeff Mixter

IIIF Annual Conference
Göttingen, Germany

OCLC Research is participating in the IIIF Discovery Working Group's on-going effort to develop a "Change Discovery API". The Change Discovery API will provide the information needed to discover and subsequently make use of IIIF resources.

Topics: IIIF, Linked Data

Fast and Discriminative Semantic Embedding

Fast and Discriminative Semantic Embedding

By Rob Koopman, Shenghui Wang, and Gwenn Englebienne

13th International Conference on Computational Semantics
Gothenburg, Sweden

We present a novel, effective and efficient method for term and document embedding method. Our experiments show it outperforms state-of-the-art methods in terms of the STS benchmark and subject prediction when trained on the same datasets, while at the same time being computationally cheaper by orders of magnitude.

 

Topics: Semantic Embedding