WorldCat Identities: Another view of the catalog
By Thomas B. Hickey, Ph.D
Chief Scientist, OCLC Research
The idea of WorldCat Identities is simple: create a summary page
for every name in WorldCat. Since
there are some 85 million records
in WorldCat and nearly 20 million
names mentioned somewhere,
this is a large-scale data mining
effort that would have been difficult
even a few years ago. We
are working with both personal and corporate
names, so you can see a page
for the Beatles, as well as the individual
page for John, Paul, George and Ringo.
Just working within WorldCat there
is a lot of information that can be associated
with people. We show lists of
the most common works written by the
person and those written about them.
Mark Twain 1835–1910 |
 |
Since we know when these works were
published we produce a graphical time
line showing their
publication history. If
we can associate roles (e.g., composer
or translator) with the person, we
display them along with the genre they
work in (e.g., Psychological fiction) and
subject headings (Novelists, American—19th century).
We also list all of the languages in
which the person has published, as
well as related names found in their
records. Each language is linked to an
Open WorldCat search of the person
and that language. Related names often
show the role that person played in
relation to the Identity being described
(e.g., children’s authors will often have
illustrators associated with them). These
names are then linked to their own Identity
pages, so it is easy to see them in their
own right.
We made extensive use of earlier efforts
by OCLC, including FictionFinder,
RedLightGreen (done by RLG before
they joined OCLC), Audience Level and
of course WorldCat.org. The focus of
WorldCat Identities is very similar to that
of RedLightGreen, aimed at an audience
that could be characterized as the ‘literate
undergraduate,’ although we hope it will be
useful to many levels of expertise. We
have also been able to incorporate
preliminary name matches of the Virtual
International Authority File project, which
helps match many German authors to the
English equivalent.
WorldCat Identities is one of the first
times we have tried to do FRBR (Functional
Requirements for Bibliographic Records) grouping at the expression
level. This means
that, for example, if you
are looking at John Tenniel’s
page, the citation for
Alice’s Adventures in Wonderland
shows only the
546 editions in WorldCat
that he illustrated, not the
2,535 credited to Lewis
Carroll. While you are on
Lewis Carroll’s page you
can see alternative names
we have pulled from the
Library of Congress authority
file, along with a Cyrillic
form of his name found in
a bibliographic record, and
a link to Charles Dodgson’s
page for publications attributed
to his other ‘persona.’
For many of the more famous
people we also have
links into Wikipedia.
The introductory page
shows a cloud with the
100 most common names
in WorldCat. One of the
most striking aspects of
this cloud is the large number
of musicians (about a
third). This highlights the
fact that WorldCat is about
what libraries hold, not necessarily
about books. In fact,
it is obvious from the cloud
that much of the material
held by libraries is music. We
are experimenting with some
other clouds that display
photographers from some
well-cataloged photography
collections.
We know we still have a
lot of work to do. The pages
are automatically generated
and sometimes this
is more obvious than one
would wish. There is also
quite a bit of library jargon
that could probably be made
more understandable. To
help us understand these
and other issues, RLG Programs is conducting a beta test of WorldCat Identities.
More than 80 librarians at 22
libraries have volunteered to
test it themselves and with
their users. This is one of
the first joint projects between
RLG Programs and
OCLC Research. It shows
how the strengths of each
group complement each
other to produce better results
than either group could
achieve alone.
Increasing libraries’ relevance on the Web | By the numbers
|