VIAFbot Wikipedia project and analysis of EAD in ArchiveGrid articles featured in Code4Lib Journal

Articles about two OCLC Research projects are included in Code4Lib Journal (Issue 22):

  • "VIAFbot and the Integration of Library Data on Wikipedia" by OCLC Research Wikipedian in Residence Maximilian Klein and University of Idaho Metadata and Catalog Librarian Alex Kyrios, and
  • "Thresholds for Discovery: EAD Tag Analysis in ArchiveGrid, and Implications for Discovery Systems" by Intelligent Systems Lab Amsterdam Researcher Marc Bron, OCLC Research Consulting Software Engineer Bruce Washburn and Senior Program Officer Merrilee Proffitt.

Abstracts of the articles appear below. Read the full articles online.

"VIAFbot and the Integration of Library Data on Wikipedia" Abstract
Maximilian Klein and Alex Kyrios

This article presents a case study of a project, led by Wikipedians in Residence at OCLC and the British Library, to integrate authority data from the Virtual International Authority File (VIAF) with biographical Wikipedia articles. This linking of data represents an opportunity for libraries to present their traditionally siloed data, such as catalog and authority records, in more openly accessible web platforms. The project successfully added authority data to hundreds of thousands of articles on the English Wikipedia, and is poised to do so on the hundreds of other Wikipedias in other languages. Furthermore, the advent of Wikidata has created opportunities for further analysis and comparison of data from libraries and Wikipedia alike. This project, for example, has already led to insights into gender imbalance both on Wikipedia and in library authority work. We explore the possibility of similar efforts to link other library data, such as classification schemes, in Wikipedia.

"Thresholds for Discovery: EAD Tag Analysis in ArchiveGrid, and Implications for Discovery Systems" Abstract
M. Bron, M. Proffitt and B. Washburn

The ArchiveGrid discovery system is made up in part of an aggregation of EAD (Encoded Archival Description) encoded finding aids from hundreds of contributing institutions. In creating the ArchiveGrid discovery interface, the OCLC Research project team has long wrestled with what we can reasonably do with the large (120,000+) corpus of EAD documents. This paper presents an analysis of the EAD documents (the largest analysis of EAD documents to date). The analysis is paired with an evaluation of how well the documents support various aspects of online discovery. The paper also establishes a framework for thresholds of completeness and consistency to evaluate the results. We find that, while the EAD standard and encoding practices have not offered support for all aspects of online discovery, especially in a large and heterogeneous aggregation of EAD documents, current trends suggest that the evolution of the EAD standard and the shift from retrospective conversion to new shared tools for improved encoding hold real promise for the future.

For more information:

Merrilee Proffitt
Senior Program Officer
OCLC Research

Melissa Renspie
Senior Communications Officer
OCLC Research


Quick links:

Code4Lib Journal (Issue 22) [link]  

"VIAFbot and the Integration of Library Data on Wikipedia" article [link

"Thresholds for Discovery: EAD Tag Analysis in ArchiveGrid, and Implications for Discovery Systems" article [link

Maximilian Klein [link] 

Bruce Washburn [link

Merrilee Proffitt [link

Related news: VIAFbot edits 250,000 Wikipedia articles to reciprocate all links from VIAF into Wikipedia [link

ArchiveGrid overview [link

We are a worldwide library cooperative, owned, governed and sustained by members since 1967. Our public purpose is a statement of commitment to each other—that we will work together to improve access to the information held in libraries around the globe, and find ways to reduce costs for libraries through collaboration.