Web Archiving Metadata Working Group

CHARGE: The OCLC Research Library Partnership Web Archiving Metadata Working Group will evaluate existing and emerging approaches to descriptive metadata for archived websites and will recommend best practices to meet user needs and to ensure discoverability and consistency.

The Problem

Archived websites often are not easily discoverable via search engines or library and archives catalogs and finding aid systems, which inhibits use.

A 2015 survey of members of the OCLC Research Library Partnership revealed the lack of descriptive metadata guidelines as the biggest challenge related to website archiving among this cohort. The second most-cited challenge is to learn about the needs of users who seek to use website content in their work.  

Review of existing guidelines, as well as sampling of descriptions in WorldCat and ArchiveGrid, reveals widely variable practice. This can be traced, at least in part, to the fact that some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives: A Content Standard). Some record creators follow bibliographic traditions, while others use an archival approach, such as describing multiple sites in one record. Sometimes the two approaches are blended.

Addressing the Problem

The Working Group studied archival and bibliographic description practices for archived websites, considered when each approach might be most appropriately used, and determined how the two might be made compatible. We kept in mind that metadata is sometimes repurposed for reuse in a variety of different tools and contexts. We also considered issues related to description of archived websites in relation to live/active sites.

Throughout 2015 and 2016, members of the Working Group:

  1. Finalized the issues to be addressed.
  2. Performed desk research to learn about user needs and behavior relative to websites to inform our approach to defining best practices for descriptive metadata.
  3. Developed recommended practices for metadata, informed by the study of existing guidelines for describing archived websites--such as those developed by the Program on Cooperative Cataloging, the New York Art Resources Consortium, and a variety of individual institutions.
  4. Studied the published literature and online sources to identify metadata issues identified by researchers in the field.
  5. Informally sampled and evaluated existing descriptions of archived websites in WorldCat (MARC records), ArchiveGrid (MARC records and finding aids), Archive-It, and other sources.
  6. Investigated available tools for web archiving and the ways in which they enable production of descriptive metadata.

The full working group met monthly via WebEx. Subgroups undertook specific tasks and report their findings to the group.

We liaised with other groups that are active and influential in the web archiving sphere of practice. These include the Web Archiving Roundtable of the Society of American Archivists, the International Internet Preservation Consortium (IIPC), and the Internet Archive.

Outputs

Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group
By: Jackie Dooley and Kate Bowers

WAM's recommended practices can be used by any institution or person with a need to describe web content. Some potential use cases:

  • Scholars building personal archives of websites for research purposes
  • Libraries and archives using RDA/MARC that seek specific guidance on the elements and content that are most pertinent to description of web content
  • Archives and libraries having a need to map their DACS-based MARC records and/or EAD-encoded finding aids to the more simplified structure of a digital repository or a web tool such as Archive-It
  • Digital repositories encoding metadata for web content in MODS without reference to any content standard
  • Archive-It users seeking guidance on creating content for Dublin Core elements

Literature Review of User Needs
By: Jessica Venlet, Karen Stoll Farrell, Tammi Kim, Allison Jai O’Dell, and Jackie Dooley

The literature review falls into two clear categories: the needs of end users and the needs of metadata practitioners. This review characterizes types of end users, their research methodologies, barriers to use, discovery interfaces, and the need for support services and outreach. The review of practitioner literatures addresses the need for scalable practices, the standards and shared practices currently in use, the outcomes of a variety of case studies and other approaches to metadata.

Review of Harvesting Tools
By: Mary Samouelian and Jackie Dooley

This report offers our objective analysis of 11 tools in pursuit of an answer to that question. We reviewed selected web harvesting tools to determine their descriptive metadata functionalities. The question we sought to answer was this: Can web harvesting tools automatically generate descriptive metadata that supports the discoverability of archived web resources? Auto-generation of descriptive metadata for archived web resources could result in significant gains in the efficiency of data entry and thus help enable metadata production at scale.  

Works in Progress Webinar: Outcomes from the OCLC Research Library Partnership Web Archiving Metadata Working Group

In this webinar, presented 2 May 2018, four members of the working group focus on the recommendations for descriptive metadata that uniquely meet the needs of web content.

 

Web Archiving Metadata Working Group

Merrilee Proffitt

Trevor Alvord
Brigham Young University

Alexis Antracoli
Princeton University

Penny Baker
Clark Art Institute

Kate Bowers
Harvard University

Lori Dedeyan
University of California, Berkeley

Evan Echols
University of Delaware

Karen Stoll Farrell
Indiana University

Rick Fitzgerald
Library of Congress

Ben Goldman
Pennsylvania State University

Rebecca Guenther
consultant

Claudia Horning
University of California, Los Angeles

Chad Hutchens
University of Wyoming

Deborah Kempe
Frick Art Reference Library

Tammi Kim
University of Nevada, Las Vegas

Jason Kovari
Cornell University

Rosalie Lack
California Digital Library

Eilidh MacGlone
National Library of Scotland

Matthew McKinley
University of California, Irvine

Allison O’Dell
University of Florida

Anchalee (Joy) Panigabutra-Roberts
American University in Cairo

Dallas Pillen
University of Michigan

Lily Pregill
New York Art Resources Consortium (NYARC)

Mary Samouelian
Harvard Business School

Aislinn Sotelo
University of California, San Diego

Jessica Venlet
Massachusetts Institute of Technology

Olga Virakhovskaya
University of Michigan