This experimental research project has concluded. The research prototype application is no longer supported or maintained by OCLC services.

The information on this page is provided for historical purposes only. Please note this content may include details that are out-of-date and broken links.

Thank you for your interest, and please explore the OCLC Research website to learn more about our current research.

Europeana Innovation Pilots

OCLC Research and Europeana are conducting innovation pilots from May 2012 through December 2013. This collaborative initiative aims to pilot the use of existing and newly developed OCLC methods and techniques for cleansing and enriching large aggregations of metadata. Our objective is to identify and create semantic links between heterogeneous objects that are connected. Some examples include: translated copies of the same publication, a painting and a photograph of that painting, different editions of one book or a collection of letters that belong to the same archive.


Europeana collects a steadily growing amount of metadata from European libraries, archives and museums. Aggregating metadata from these heterogeneous collections leads to quality issues such as duplication, uneven granularity of the object descriptions, ambiguity between original and derivative versions of the same object, etc.

OCLC Research has extensive experience and expertise in metadata quality improvement techniques and methods, such as duplicate detection and clustering of similar metadata records around FRBR-entity-relationships, reproductions and originals, and different cataloging languages. We are also experimenting with the automated enhancement of records with links to VIAF and other Linked Data elements. Our data quality improvement and enrichment efforts are part of our philosophy to “make the metadata work harder for libraries” and to enhance end-user experience.


Our collaboration with Europeana will be mutually beneficial. The outcomes of the research project will feed into the implementation of the Europeana Data Model (EDM), which is devised to improve the browsing experience of the visitors of the Europeana Portal. In addition, the piloting of clustering and enrichment methods and techniques on Europeana data will inform follow-up activities in more innovative directions and opportunities to develop new data services.

Progress and Outputs


  • Wang Shenghui, Antoine Isaac, Valentine Charles, Rob Koopman, Anthi Agoropoulou, and Titia van der Werf. 2013. "Hierarchical Structuring of Cultural Heritage Objects within Large Aggregations.” Proceedings of the 3rd International Conference on Theory and Practice of Digital Libraries; Research and Advanced Technology for Digital Libraries; Lecture Notes in Computer Science Volume 8092, 247-259. Available online at:

Summary: Huge amounts of cultural content have been digitized and are available through digital libraries and aggregators like However, it is not easy for a user to have an overall picture of what is available or to find related objects. We propose a method for hierarchically structuring cultural objects at different similarity levels. We describe a fast, scalable clustering algorithm with an automated field selection method for finding semantic clusters. We report a qualitative evaluation on the cluster categories based on records from the UK and a quantitative one on the results from the complete Europeana dataset.

Task Force Service

Based on her work on the Europeana Innovation Pilots, Shenghui Wang has been invited to participate in the Europeana Task Force "Multilingual and Semantic Enrichment Strategy".

Blog post:

Valentine Charles posted a blog on the Europeana Professional Blog in June 2013, to share interim results with the Europeana Community.


  • Wang, Shenghui. 2013. "Hunting for Semantic Clusters: How Can We Find Interesting Stuff in Over 22 Million Europeana Objects?" Presented at EMEARC, 26 February 2013, Strasbourg, France.
    View on Prezi




Titia van der Werf

Team Members

Shenghui Wang
Rob Koopman