Data Science & Metadata Research
To be discoverable by today’s online users, traditional library data must be transformed. OCLC Research analyzes bibliographic data to derive new meaning, insights, and services for use by library and information seekers. This work includes special projects, data science research, engagement with metadata communities, publications and presentations, and the creation of illustrative experimental applications.
This session shares findings from a forthcoming OCLC Research report on Research Information Management Practices in the United States (http://oc.lc/us-rim-project), scheduled for early fall 2021. The report collects evidence from in-depth case studies of RIM practices at five US research universities: Penn State University, Texas A&M University, Virginia Tech, UCLA, and University of Miami. The case studies represent open source, proprietary, and home grown RIM solutions at the five institutions and highlight the proliferation of use cases such as public portals, faculty activity reporting, and strategic reporting.
By synthesizing information from the five case studies, we offer a comprehensive definition of Research Information Management and also document the multiple use cases that proliferate in decentralized US research universities. We will also offer a new RIM System Framework, which describes the required and optional functional and technical elements that comprise the architecture of US RIM systems, regardless of use case. We believe that this framework will help demystify RIM infrastructure and also help practitioners better understand the array of campus stakeholders required for successful RIM implementation.
This research is based upon interviews with 39 participants engaged in RIM activities at the five case study institutions and builds upon the significant body of work on RIM practices already produced by OCLC Research (oc.lc/rim). We believe this research is of considerable utility to the university community, offering a more comprehensive and strategic view of RIM practices, along with recommendations for institutions. We will conclude the presentation by demonstrating the value of the case studies and framework through examples pulled from the report’s case studies.
Topics: Research Information Management
Wikidata is an open knowledge base of structured data that describes any type of entity, including people, organizations, concepts, events, places, and works. Some works described in Wikidata now include a IIIF Presentation Manifest URL. In Wikidata’s default user interface, that URL appears as a link to the Manifest JSON. But Wikidata can be customized to alter the user interface and add new features.
In this presentation we will discuss and demonstrate a Wikidata user script that, for items that include a IIIF Presentation Manifest URL, will embed the ProjectMirador viewer and load the Manifest JSON so that the images referenced in the Manifest can be viewed in the context of other Wikidata statements about the work.
The discussion will cover how the user script embeds the Mirador3 viewer within a Wikidata item page and how it detects that the viewer should be added. We will also illustrate how one library is including IIIF manifests in Wikidata, with a conversation about learnings from that work, and about how the user script has contributed to the library's understanding of IIIF metadata and Wikidata. The demonstration will show how Wikidata user scripts are created and shared and look at ways in which Wikidata queries can uncover IIIF manifests.
Topics: Wikimedia, IIIF
Understanding how data reusers seek and evaluate potential data for reuse will aid data curators, data managers, and developers in the open repository field. We will review past studies of data reusers, specifically a qualitative study of 105 researchers from three disciplinary communities: quantitative social science, archaeology, and zoology. The study identified 12 types of context information that data reusers mention needing when deciding whether to reuse data. Next, we will use the context types to create a feature set and assess how data repositories provide the needed context information to users. Finally, using findings from our assessment, we will showcase desirable features in use to prototype the design of a reuser-oriented data repository that developers can use to improve their data repository interface.
Topics: Open Access, Research Data Management
This presentation summarized OCLC's findings on the impact of new workflows in the ground-shifting transition from traditional cataloging to linked data platforms, highlighted the integral engagement, participation, and feedback from OCLC members, and attempted to chart a linked data research path for the decade to come.
Recording available on LD4 on YouTube.
Topics: Linked Data
This presentation highlights key lessons from OCLC Research’s Linked Data Wikibase Prototype (“Project Passage”), a 10-month pilot done in 2018 in collaboration with metadata specialists in 16 US libraries.
PowerPoint Slides (11MB)
Topics: Linked Data
This presentation discusses the work of catalogers who participated in OCLC's Project Passage in 2018. It develops the theme of identification of "the entities that matter" and concludes with a brief update on OCLC's post-Passage activities involving resource description in Wikibase.
Topics: Linked Data, Wikimedia
Tampa, Florida, USA
IIIF is an emerging standard for sharing digital structural metadata. OCLC is an active member of the IIIF community and has been working to integrate the standard in is services/products. This talk discusses the experimental IIIF work being done by OCLC Research to help test evolving IIIF standards and help integrate them into production services.
Topics: IIIF, Linked Data
Indianapolis, IN, USA
The CONTENTdm Linked Data pilot explores how to convert CONTENTdm data into linked data, how to curate the data in the Wikibase infrastructure, and how to use the data to improve end-user experiences in CONTENTdm. This presentation covers the background research that led to the development of the pilot, the plans for the 3 phases of the pilot, and some early feedback from one of the pilot participants.
Topics: Linked Data, IIIF
OCLC Research is participating in the IIIF Discovery Working Group's on-going effort to develop a "Change Discovery API". The Change Discovery API will provide the information needed to discover and subsequently make use of IIIF resources.
Topics: IIIF, Linked Data
We present a novel, effective and efficient method for term and document embedding method. Our experiments show it outperforms state-of-the-art methods in terms of the STS benchmark and subject prediction when trained on the same datasets, while at the same time being computationally cheaper by orders of magnitude.
Topics: Semantic Embedding