Close window
research banner
researchbox

WorldCat Identities: Another view of the catalog

By Thomas B. Hickey, Ph.D
Chief Scientist, OCLC Research

The idea of WorldCat Identities is simple: create a summary page for every name in WorldCat. Since there are some 85 million records in WorldCat and nearly 20 million names mentioned somewhere, this is a large-scale data mining effort that would have been difficult even a few years ago. We are working with both personal and corporate names, so you can see a page for the Beatles, as well as the individual page for John, Paul, George and Ringo.

Just working within WorldCat there is a lot of information that can be associated with people. We show lists of the most common works written by the person and those written about them.

Mark Twain 1835–1910

Since we know when these works were published we produce a graphical time line showing their publication history. If we can associate roles (e.g., composer or translator) with the person, we display them along with the genre they work in (e.g., Psychological fiction) and subject headings (Novelists, American—19th century).

We also list all of the languages in which the person has published, as well as related names found in their records. Each language is linked to an Open WorldCat search of the person and that language. Related names often show the role that person played in relation to the Identity being described (e.g., children’s authors will often have illustrators associated with them). These names are then linked to their own Identity pages, so it is easy to see them in their own right.

We made extensive use of earlier efforts by OCLC, including FictionFinder, RedLightGreen (done by RLG before they joined OCLC), Audience Level and of course WorldCat.org. The focus of WorldCat Identities is very similar to that of RedLightGreen, aimed at an audience that could be characterized as the ‘literate undergraduate,’ although we hope it will be useful to many levels of expertise. We have also been able to incorporate preliminary name matches of the Virtual International Authority File project, which helps match many German authors to the English equivalent.

WorldCat Identities is one of the first times we have tried to do FRBR (Functional Requirements for Bibliographic Records) grouping at the expression level. This means that, for example, if you are looking at John Tenniel’s page, the citation for Alice’s Adventures in Wonderland shows only the 546 editions in WorldCat that he illustrated, not the 2,535 credited to Lewis Carroll. While you are on Lewis Carroll’s page you can see alternative names we have pulled from the Library of Congress authority file, along with a Cyrillic form of his name found in a bibliographic record, and a link to Charles Dodgson’s page for publications attributed to his other ‘persona.’ For many of the more famous people we also have links into Wikipedia.

The introductory page shows a cloud with the 100 most common names in WorldCat. One of the most striking aspects of this cloud is the large number of musicians (about a third). This highlights the fact that WorldCat is about what libraries hold, not necessarily about books. In fact, it is obvious from the cloud that much of the material held by libraries is music. We are experimenting with some other clouds that display photographers from some well-cataloged photography collections.

We know we still have a lot of work to do. The pages are automatically generated and sometimes this is more obvious than one would wish. There is also quite a bit of library jargon that could probably be made more understandable. To help us understand these and other issues, RLG Programs is conducting a beta test of WorldCat Identities. More than 80 librarians at 22 libraries have volunteered to test it themselves and with their users. This is one of the first joint projects between RLG Programs and OCLC Research. It shows how the strengths of each group complement each other to produce better results than either group could achieve alone.


left arrowIncreasing libraries’ relevance on the Web | By the numbersright arrow