Descriptive Metadata for Web Archiving

OCLC Research established the Web Archiving Metadata Working Group (WAM) to develop recommendations for descriptive metadata. Their approach is tailored to the unique characteristics of archived websites, with an eye to helping institutions improve the consistency and efficiency of their metadata practices in this emerging area. The result of this collaboration is three publications that cover recommendations to help institutions improve the consistency and efficiency of their metadata practices, a literature review of user needs, and a review of web harvesting tools.

Review of Harvesting Tools

By: Mary Samouelian and Jackie Dooley

The OCLC Research Library Partnership Web Archiving Metadata Working Group (WAM) was formed to recommend descriptive metadata best practices for archived web content. When the group began its work early in 2016, we discovered that metadata practitioners had high hopes that it would be possible to extract descriptive metadata from harvested content.

This report offers our objective analysis of 11 tools in pursuit of an answer to that question. We reviewed selected web harvesting tools to determine their descriptive metadata functionalities. The question we sought to answer was this: Can web harvesting tools automatically generate descriptive metadata that supports the discoverability of archived web resources? Auto-generation of descriptive metadata for archived web resources could result in significant gains in the efficiency of data entry and thus help enable metadata production at scale.  

Our intent was twofold: 1) provide the web archiving community with a description of each relevant tool’s overall purpose and metadata-related capabilities, and 2) inform WAM’s overarching objective of preparing best practice recommendations for web archiving descriptive metadata based on an understanding of user needs. 

Download US Letter .pdf

Download A4 .pdf    


Suggested citation:

Samouelian, Mary, and Jackie Dooley. 2018. Descriptive Metadata for Web Archiving: Review of Harvesting Tools. Dublin, OH: OCLC Research.