Duplicate Detection and Resolution
Duplicate Detection and Resolution (DDR) software is now in full operation. A run of the full WorldCat database (beginning with OCLC #1) began February 2, 2010 and completed on September 30, 2010. A total of 166,422,941 records were processed and 5,126,132 duplicate records were eliminated.
In addition, a separate process that examines selected new records and replaced records from each day's journal files began running January 26, 2010. This processing will continue.
Beginning in 1991, OCLC used its Duplicate Detection and Resolution (DDR) software to match WorldCat bibliographic records in the books format against themselves to find and merge duplicates.
By mid-2005 when WorldCat migrated to its new platform, sixteen runs through WorldCat had been completed, resulting in the elimination of a total of 1.6 million duplicate records.
In 2005, a project was started to re-invent the DDR software to work in the new environment and to expand its capabilities to deal with all types of bibliographic records. This large multi-year project is now bearing fruit. Great improvements to our matching software, which are a key component of the new DDR, have regularly been incorporated into the batchloading process. This helps bring both DDR and batchloading processes into alignment as never before in dealing with the problem of duplicate records in WorldCat.
In May 2009, the new software was put into production following rigorous planning, development, and testing. In addition to its ability to deal with continuing resources, scores, sound recordings, visual materials, maps, and electronic resources, as well as books, this new DDR is much more sophisticated than its predecessor in its power to distinguish legitimate matches from incorrect ones. It also has the flexibility to allow selection of certain categories of bibliographic records to target for deduplication. Processing of small subsets of WorldCat against the live database has begun. A full pass through the WorldCat database began in February 2010 and ended in September 2010.
Having the new DDR software in production is resulting in the merging of a larger number of bibliographic records. Libraries will notice fewer duplicate records in WorldCat. This should be particularly visible for printed music, sound recordings and AV materials since the previous DDR software did not address these duplicates. Regular removal of duplicates provides a better WorldCat for all its users.
Between May 2009 and 30 November 2012:
- 337,388,541records have been processed through DDR
- 11,395,179 duplicate records have been removed
Wondering about a merge?
Every effort has been made to prevent inappropriate merges. Since DDR is an automated process, there may be an occasional inappropriate merge. If you notice a record that appears to be an inappropriate merge, please report it to firstname.lastname@example.org. OCLC staff will examine the records in question and, if possible, reverse the merge if it is inappropriate.