Betatest WorldCat Identities project

Note: This project has been completed.

The problem: There is no one source of information for users to uniquely identify a person or corporate body. National authority files, identifying entities from published works, provide scant information and do not include the scripts that the author or corporate body itself may have used. The LC/NACO authority file, for example, is designed for librarians, and the practice of distinguishing between authors by birth dates is insufficient for most users.

Users generally have no access to any one resource that illustrates the history and works by and about persons and corporate bodies that may be known by a variety of names depending on location. Researchers need a tool to support discovery of publication "pedigrees" (to establish authority and relevance of a title)—and the ability to disambiguate publisher names. Data mining of institutional resources—in addition to WorldCat—can help institutions also manage names across resources and institutions.

Users generally have no access to any one resource that illustrates the history and works by and about persons and corporate bodies that may be known by a variety of names depending on location. Researchers need a tool to support discovery of publication "pedigrees" (to establish authority and relevance of a title)—and the ability to disambiguate publisher names. Data mining of institutional resources—in addition to WorldCat—can help institutions also manage names across resources and institutions.

The prototype solution: The WorldCat Identities aims to address end users' need to uniquely identify authors—both persons and corporate bodies. WorldCat Identities compiles information from a variety of resources, including information data-mined from WorldCat, to illustrate the history and works by and about persons and corporate bodies that may be known by a variety of names depending on location. The twenty million identities covered are more than in any other resource currently available.

Ninety-five staff from twenty-one RLG Programs partners participated in a beta test February 1 through April 30, 2007 to evaluate the resource and its potential to provide the information to uniquely identify a "creative entity" among many similar ones within different contexts and systems. Feedback came primarily from librarians, rather than from end users. The Publication Timeline—graphically illustrating the publication dates of works by the author, including those published posthumously, and work about the author—was particularly praised.

General feedback was positive:

"This is a very creative and impressive database with much potential for a wide variety of users ... All in all this is an exciting product, but, more importantly, a useful one. (Susan Flanagan, Getty Research Institute)

"WorldCat Identities is one of the most exciting things I've seen from OCLC. It takes a giant step toward crossing the gap between what authority files are meant to do and what users really want from them." (Stephen Hearn, U. Minnesota)

"This is a very exciting-looking project that should allow for new perspectives on 'resource discovery.'" (William Kopycki, U. Pennsylvania)

"The Publication Timeline gives an interesting snapshot view of scholarship over time, at least as applied to a single person." (Daniel Mack, Pennsylvania State)

"The WorldCat Identities beta tool may have its roots in the 'authority file' mode. But, adding the data-mining tool, it begins to transform into something larger. I think these are exactly the kinds of things we need to be building." (Martin Schreiner, Harvard University)

"I found [the Publication Timeline] one of the most intriguing features in Identities. It suggests a whole new discipline of 'bio-bibliometrics.' For example, compare the timelines of Aristotle and Plato to visualize the course of medieval and modern philosophy or compare the different course of the careers and reputations of Byron and Shelley. Admiral Nelson's timeline shows a peak around 1905, which interestingly coincides with the naval arms race of the early 20th century and the centenary of the Battle of Trafalgar." (Richard Wakefield, British Library)

Enhancements made during the beta test period, primarily due to feedback from beta testers:

  • Corporate names were added.
  • Subject headings or genres, when present, were added to the list of identities retrieved to differentiate authors with similar names (e.g., John Adams the composer from John Adams the President).
  • Subject headings and genres were linked to retrieve other identities within the same subject and genre. The links retrieve "identity clouds" of the top 100 authors in the subject or genre as well as an alphabetical list of all identities associated with a specific subject or heading.
  • Colors were added to the Publication Timeline to differentiate works by a person during the author's lifetime from works published posthumously and from works written about the author.
  • The ability to retrieve publications for particular years in the Timeline was added.
  • The Timeline was refined by discarding unknown dates, publication dates before the author's birth date, and large date ranges.
  • The HTML titles and links were improved to enhance rankings by Web search engines.
  • Related Names were improved to lead to searches where both the Identity and the Related Name appear.
  • Roles were made to link to the WorldCat records they came from.
  • A "More" option was added to expand the list of citations by and about to 20.
  • Links to Wikipedia were doubled, to 50,000.
  • Links to the German national authority file were added. (The names established in Germany may be one of the "alternate names.")
  • Icons were added to the results list to differentiate between personal identities and corporate identities and the icons that are controlled (link to the LC/NACO authority file) were colored.
  • Name format was reverted to first name, last name on retrieval of an Identity ("Works by Normal Mailer" rather than "Works by Mailer, Norman").

The beta testers pointed out several areas that need to be improved:

  • WorldCat Identities reflects cataloging variations in WorldCat itself, resulting in duplicate entries for the same author. We are working on better normalization that can reduce the number of duplicates.
  • Subject headings in WorldCat can also vary (e.g., French author vs. French novelist), excluding identities who write in the same subject area. We will be adding multiple subject headings that will provide a more comprehensive view of the subjects the authors write in.
  • Context is lacking. We will add a brief explanation of what WorldCat Identities is on the home page with a link to documentation of the contents and sources. We will add a More About link to explain how Audience Level is derived.

Other enhancements suggested by beta testers that are under consideration:

  • Possibly adding these "useful links": Internet Movie Database (IMDb), home Web pages of institutions cited.
  • Adding formats (in addition to languages) of the works represented (e.g., Brad Pitt's works are likely to be movies, Prince's works are likely to be musical recordings).
  • Refining the presentation of Uniform Titles, with the title (field 245) alongside. For example, Dostoyevski's most widely held works would be listed with their English as well as their Russian titles. Or for Aaron Copland, "Fanfare for the common man; Rodeo; Appalachian spring (suite)" would appear instead of "Orchestra Music. Selections."
  • Making Audience Level selectable to adjust results to only children's literature or to specialists.

WorldCat Identities was incorporated into WorldCat.org in November 2007. Clicking on the About the Author(s) link under the Details tab brings you to the WorldCat Identities page for the author.

RLG Programs partners beta testers for WorldCat Identities

  • British Library
  • Columbia University
  • Getty Research Institute
  • Harvard University
  • Indiana University
  • Library of Congress
  • National Library of Australia
  • New York Public Library
  • New York University
  • Pennsylvania State University
  • Princeton University
  • Rutgers, The State University of New Jersey
  • Swiss National Library
  • University of California, Berkeley
  • University of California, Los Angeles
  • University of Michigan
  • University of Minnesota
  • University of Oxford
  • University of Pennsylvania
  • University of Washington
  • Yale University

RLG Programs project liaisons

Thomas Hickey
Chief Scientist
hickey@oclc.org

Karen Smith-Yoshimura
Program Officer
karen_smith-yoshimura@oclc.org

For more information

Karen Smith-Yoshimura
Program Officer
smithyok@oclc.org

We are a worldwide library cooperative, owned, governed and sustained by members since 1967. Our public purpose is a statement of commitment to each other—that we will work together to improve access to the information held in libraries around the globe, and find ways to reduce costs for libraries through collaboration.