|
Re-implementing Duplicate Detection and Resolution (DDR)
Beginning in 1991, OCLC used its Duplicate Detection and Resolution (DDR) software to match WorldCat bibliographic records in the books format against themselves to find and merge duplicates.
By mid-2005 when WorldCat migrated to its new platform, sixteen runs through WorldCat had been completed, resulting in the elimination of a total of 1.6 million duplicate records.
In 2005, a project was started to re-invent the DDR software to work in the new environment and to expand its capabilities to deal with all types of bibliographic records. This large multi-year project is now bearing fruit. Great improvements to our matching software, which are a key component of the new DDR, have regularly been incorporated into the batchloading process. This helps bring both DDR and batchloading processes into alignment as never before in dealing with the problem of duplicate records in WorldCat.
In May 2009, the new software was put into production following rigorous planning, development, and testing. In addition to its ability to deal with continuing resources, scores, sound recordings, visual materials, maps, and electronic resources, as well as books, this new DDR is much more sophisticated than its predecessor in its power to distinguish legitimate matches from incorrect ones. It also has the flexibility to allow selection of certain categories of bibliographic records to target for deduplication. Processing of small subsets of WorldCat against the live database has begun. A full pass through the WorldCat database will begin later in 2009.
Having the new DDR software in production will result in the merging of a larger number of bibliographic records. Regular removal of duplicates will provide a better WorldCat for all its users.
(2009 06 23)
|