OCLC Research Quarterly Highlights
Issue 8 : Third Quarter : January-March 2013
OCLC 754109685 : ISSN 2163-8675

Share Now

A message from Lorcan Dempsey

Lorcan Dempsey

Names and identities have become a major focus of interest for OCLC Research.

We know very well that names are not always straightforward. Brian O’Nolan and Brian Ó Nualláin are the English and Irish versions, respectively, of the name of the person who is more commonly known to us as the author Flann O’Brien.

But things are more complicated. Flann O’Brien was the ‘identity’ he chose when writing novels in English. As the prolific author of satirical columns in the Irish Times, he was known as Myles na gCopaleen, under which identity he also wrote an Irish language novel (and it should be noted that this turns up under different spellings, Myles na Gopaleen, for example). I take it that Flann O’Brien and Myles na gCopaleen are examples of what ISNI (more of which later) calls ‘public identities’ for Brian O’Nolan. Of course, it does not stop here, as Flann O’Brien was also known by several other names.

More generally, even if most people’s names and identities are less complicated than this, there is not a one-to-one relationship between names and people. This means that the relationship between people and their names and identities has become something that is managed in a variety of places. Of course, different choices can be made in those places. If I do a search on Wikipedia for either Flann O’Brien or Myles na gCopaleen, for example, I get directed to the page for ‘Brian O’Nolan’. Wikipedia directs us to the person Brian O’Nolan, rather than to any of his assumed identities. Libraries have for a long time had an apparatus to manage this plurality: authority control. And national library authority files have different practices in how they roll names up to one or more identities.

Flann O'Brien record in VIAF

Authority control has typically been organized on a country by country basis: national libraries organize national authority files. As the network unifies information spaces globally, purely national files have less utility. Recognising this issue, OCLC Research has been working with national libraries around the world to synthesise authority files into what we call the Virtual International Authority File, VIAF. VIAF now brings together names from those files into 24 million clusters, and assigns each of these a unique ID. Matches are not made simply on name text strings—contextual data from the authority files (e.g., birth date) and associated bibliographic data are also used. Further work is being done to increase the value of this consolidated resource. For example, the VIAFbot initiative creates links between Wikipedia and VIAF, inserting a VIAF link on appropriate Wikipedia pages. And VIAF now treats Wikipedia as a contributory source, ingesting names from Wikipedia alongside names from national authority files. In this way, there is a direct, actionable link between the global, addressable knowledge base that (the English) Wikipedia has become and library files, enhancing the value of each. The Wikipedia page for Brian O’Nolan has a link to the VIAF entry for Flann O’Brien. Importantly, from this we can assert that the "thing" or "person" described by http://en.wikipedia.org/wiki/Brian_O'Nolan is the same as the "thing" or "person" described by http://viaf.org/viaf/22146540/. We are looking at other language Wikipedias also.

VIAF has quickly become a major source of data about names. It gives a unique identifier to those entities—people, organizations, and others—which are the creators or subjects of works, gathers names which designate them, and contextualizes them with associated metadata. OCLC and the participating national libraries hope to see VIAF become an important backbone reference in the emerging web of data. And for this reason, we have made it openly available as linked data.

Of course, there are also other important initiatives, notably ISNI and ORCID. As the web becomes more central to scholarly and cultural activity, and as more information work is automated, identity and disambiguation are increasingly important. People are resources which need to be discoverable, referencable, and relatable. Accordingly, names and their relationship to the people they designate has become a key interest in the cultural, educational and creative fields.

Because of our work with VIAF we are closely connected with both ISNI and ORCID. The International Standard Name Identifier (ISNI) is an ISO standard (ISO 27729) that uniquely identifies "the identities used publicly by individuals or organizations involved in creating, producing, managing and distributing content." It is managed by a consortium of national libraries, rights organizations and others. The ability to unambiguously designate a person is important in a rights environment. VIAF provided data to seed the ISNI pool of people data and OCLC provides the infrastructure to manage ISNIs. ISNIs have begun to be added to VIAF. ORCID emerges from the scholarly publishing arena where the consistent identification of researchers has always been an issue. Here again, as more services are built on the programmatic manipulation of data about publications, researchers and institutions, unambiguous designation has become a goal.  OCLC Research participates on the ORCID Board and the relationship between VIAF and ORCID is being discussed.

We expect to see various relationships between ISNI, VIAF and ORCID as they evolve. Alongside this involvement in these important formal initiatives, we have some other important interests in names. These include:

  • The source of information about names in the above initiatives is the creator themselves, or expert metadata creators. However, we also expect to complement this work by programmatically identifying names. We are exploring automatic recognition, extraction, and disambiguation of named entities (e.g., the names of people, places, and organizations) from digital texts. This work will be increasingly important, as manual description methods will not scale.
  • We provide Worldcat Identity pages into the Worldcat.org environment. Here is the page for Flann O’Brien: http://www.worldcat.org/wcidentities/lccn-n50-1905. Worldcat Identities has a summary page for every name in WorldCat (currently over 40 million names) including named persons, organizations and fictitious characters. The pages include information derived from WorldCat and other sources (including VIAF) alongside unique data derived or created through a variety of special processing activities (e.g., WorldCat Identities provides statistical data about how widely held a work is). A typical WorldCat Identities page will include a list of most widely held-by-libraries works by and about the identity, a list of variant forms of name the identity has been known by, a tag cloud of places, topics, etc. closely related to works by and about the person, links to co-authors, and more. While Worldcat Identities and VIAF are developed from different directions, we are looking at closer links between them.
  • We have been working with a group of Syriac scholars, looking at issues around accepting a feed of names into VIAF from a scholarly community rather than from a national authority file. See http://hangingtogether.org/?m=201303 for some discussion. In general, it is likely that VIAF will synthesise data from other additional sources which go beyond national authority files.
  • Given growing library interest in the names and identities of their institution’s researchers, the partial nature of national authority files (they typically only include creators of works which have been catalogued), and the emerging variety of identifier approaches, we have instituted a working group to explore issues around names, researchers, libraries, and national authority files. http://oclc.org/research/activities/registering-researchers.html The initial aim is to produce a report which looks at the role of authority files as library and researcher needs change.

We are pleased to be doing this important work as part of an emerging infrastructure around names and identities. We welcome enquiries or comments—to me or directly to colleagues working on these initiatives. Further details can be found on these pages:

As always, if you are interested in discussion or collaboration around the topics discussed in this update, please get in touch.

Regards, Lorcan Dempsey

Lorcan Dempsey is Vice President, OCLC Research, and Chief Strategist

Prototypes and Services

MARC Usage in WorldCat—This project studies the use of MARC tags and subfields in WorldCat and produces outputs to inform decisions about where we go from here.



WorldCat Live!—The Innovation Lab's WorldCat Live! API provides a real-time stream of newly added records of library collections and published materials to WorldCat, the world's largest online database for discovery of library resources. The Visualization Interface provides interactive visualizations of WorldCat Live! API data for geographic, language, and formats.



LibraryFinder—Library Finder uses the WorldCat Registry API to locate libraries near a given location and displays the contact information, website, and library type when available. It uses the HTML5 Geolocation API in supported browsers to determine where a user is, and Responsive Web Design to make the website fully mobile friendly. (Supports FF6+, Safari, Chrome, or IE9+)



Kindred Works is a demonstration interface built upon an experimental content-based recommender service. Various characteristics associated with a sample resource, such as classification numbers, subject headings, and genre terms, are matched to WorldCat to provide a list of recommendations.



WorldCat Identities Network gives users the opportunity to visually explore the interconnectivity and relationships between WorldCat Identities.


Lynn Silipigni Connaway
Lynn Silipigni Connaway, David White, Donna Lanclos, & Alison Le Cornu
Visitors and Residents: What Motivates Engagement with the Digital Information Environment?

This paper reports on the three-year Digital Visitors and Residents project. Initial results highlight the importance of convenience as a crucial factor in information-seeking behavior. There also are indications that as users progress through the educational stages, the digital literacies they employ do not necessarily become more sophisticated. Initial findings indicate that students in theemerging educational stage (late stage secondary school to first year undergraduate) use smart phones and laptop computers to access Wikipedia, Google, teachers or professors, friends and peers to get information for their academic studies. Information Services & Use, 32,3-4.


Lorcan Dempsey
Lorcan Dempsey
Libraries and the Informational Future: Some Notes

Presented at the Information Professionals 2050 Conference, this paper discusses environmental trends for libraries and some consequences for library education. Its overarching theme is that we need to prepare for systemic changes by better understanding how organizations are being reshaped by networks.Information Research, 18,1 (paper 556). In addition to this special journal issue, conference papers are also published as a separate monograph, Information Professionals 2050: Educational Possibilities and Pathways, edited by Gary Marchionini and Barbara B. Moran and published by the University of North Carolina at Chapel Hill.


Full Name
Jackie M. Dooley, Rachel Beckett (University of Manchester), Alison Cullingford (Bradford University), Katie Sambrook (King’s College London), Chris Sheppard (University of Leeds), Sue Worrall (University of Birmingham)
Survey of Special Collections and Archives in the United Kingdom and Ireland

Special collections and archives play a key role in the future of research libraries. However, significant challenges face institutions that wish to capitalize on that value. In 2010, OCLC Research published an evidence-based appraisal of the state of special collections in the US and Canada. This report, produced as a collaboration of OCLC Research and Research Libraries UK (RLUK), builds on the foundation established by that report, and provides both evidence and a basis for action as part of the RLUK's Unique and Distinctive Collections workstrand  and OCLC Research's Mobilizing Unique Materials theme. Key findings and recommendations are highlighted in the report's executive summary (.pdf: 69K/12 pp.), which is separately available.


"Aggregations as Information Supermarkets" Video Illustrates Importance of Supplying Quality Data on Web
This video presents a short skit featuring OCLC Research staff performers introducing the "Metadata Out of Control: Network-level Metadata Aggregations" presentation at OCLC EMEA Regional Council Annual Meeting on 26 February. More...
Chronicle of Higher Education Blog Highlights Key Points from MOOCs and Libraries Event
The event featured thoughtful and provocative presentations about ways libraries are getting involved with massive open online courses (MOOCs), including the challenges and strategic opportunities they are facing. This blog post summarizes key points, including the various roles academic librarians can play in the MOOCs phenomenon and how they can best prepare for them. More...
SRU Approved as OASIS Standard
SRU (Search and Retrieve via URL) is the web-based successor to Z39.50. More...
OCLC Research to Study MARC Tag Usage in WorldCat to Determine Best Use of Data Encoded Using MARC Standard
The goal of this new MARC Usage in WorldCat activity is to provide an evidence base for testing assertions about the value of capturing various attributes. More...
OCLC Research and ALISE Name 2013 Research Grant Recipients
OCLC Research and ALISE have awarded research grants to Lynne Bowker of the University of Ottawa, Kyung-Sun Kim of the University of Wisconsin–Madison, and Sei-Ching (Joanna) Sin of Nanyang Technological University and Sanghee Oh of Florida State. More...
Lorcan Dempsey's "Thirteen Ways" Ranks Third on EDUCAUSE Review Online's List of Ten Most Widely Read Articles from 2012
Published on 10 December 2012, "Thirteen Ways of Looking at Libraries, Discovery, and the Catalog: Scale, Workflow, Attention" uses the position of the catalog to illustrate more general discovery and workflow directions. More...
OCLC Post-Doctoral Researcher Ixchel Faniel, Ph.D. and Colleagues Win Best Conference Paper at IDCC13 in Amsterdam
Faniel and U. Michigan colleague Elizabeth Yakel, Ph.D., presented “Trust in Digital Repositories,” written with DIPIR-project colleagues Adam Kriesberg (U. Michigan) and Ayoung Yoon (U. North Carolina SILS). More...
A complete list of OCLC Research news items is available online at: http://www.oclc.org/research/news.html.

Events, Webcasts and Presentations

Print Management at "Mega-scale" Webinar
In this webinar, Constance Malpas and Brian Lavoie present findings from their report,  Print Management at "Mega-scale": a Regional Perspective on Print Book Collections in North America. The recording, slides, chat transcript and archived tweets from this OCLC Research webinar are available online. More...
"MOOCs and Libraries" Webcast Recording
Open to all, this free event provides thoughtful and provocative presentations about how libraries are already getting involved with MOOCs. View webcast recordings on the OCLC Research YouTube channel, view the Next Steps document and archived event Tweets, and more. More...
OCLC Research Roundup Webinar Recording and Slides
In this webinar, OCLC Research Consulting Project Manager Eric Childress provides an overview of OCLC Research as well as findings from recent reports. More...
Roy Tennant's "Cataloging Unchained" Video Illustrates the Power of Data Mining Our Cataloging Heritage
Roy presented this video during his keynote at the OCLC EMEA Regional Council (EMEARC) Annual Meeting in Strasbourg, France. More...
Join OCLC Research Staff at the EMEA Regional Council Meeting 25-26 February in Strasbourg, France
OCLC Research staff made presentations at several sessions during OCLC's Europe, Middle East and Africa Regional Council (EMEARC) Member Meeting in the Palais des Congrès in Strasbourg, France. More...
Putting "Special" in the "Collective Collection" Forum Blog Post and Slides
These outputs provide an overview of the OCLC Research Library Partnership meeting at the New-York Historical Society to discuss Putting the "Special" in the "Collective Collection." More...
Are We Reconfigured Yet? US Research Libraries–Priorities, Trends, Directions Webinar Outputs
The recording, slides, chat transcript and archived tweets from this OCLC Research Library Partnership webinar are publicly available online. More...
"The Inside Out Library: Scale, Learning, Engagement" Slides Explain How Today's Libraries Can More Effectively Respond to Change
In this keynote at the 21st annual BOBCATSSS Conference in Ankara, Turkey, Lorcan Dempsey explains how the "inside-out" model may better serve today's libraries in becoming learning organizations which can respond effectively to change. More...
Linking Library Data to Wikipedia Part III Video Provides VIAFbot Statistics and Recaps Its Accomplishments
In this video, OCLC Research Wikipedian in Residence Max Klein and Senior Program Officer Merrilee Proffitt discuss how VIAFbot added 250,000 reciprocal links from VIAF to Wikipedia. More...
Jim Michalko Shares US Research Libraries' Priorities, Trends and Directions in Webinar for OCLC Research Library Partners
This presentation, “Are We Reconfigured Yet? US Research Libraries—Priorities, Trends, Directions,” was originally given exclusively for OCLC Research Library Partners. More...
OCLC Research Staff at ALA Midwinter in Seattle
Seven OCLC Research staff gave presentations, served as panelists and chaired meetings at this conference. More...
Senior Research Scientist Lynn Silipigni Connaway at ALISE '13 in Seattle
Senior Research Scientist Lynn Silipigni Connaway, Ph.D., gave two presentations, moderated two sessions, chaired a meeting and accepted the Bohdan S. Wynar Research Paper Award at this event. More...
A complete list of OCLC Research events is available online at: http://www.oclc.org/research/events.html.

Looking Beyond the Quarter...

OCLC Research Library Partnership Briefing at UC Irvine
Originally conducted exclusively for OCLC Research Library Partners, this briefing provides information about our current and upcoming activities, an overview of our major thematic areas, an overview of SHARES and ways to engage effectively in the Partnership. More...
OCLC Research Library Partnership Briefing at UCLA
Originally conducted exclusively for OCLC Research Library Partners, this briefing provides information about our current and upcoming activities, an overview of our major thematic areas, an overview of SHARES and ways to engage effectively in the Partnership. More...
OCLC Research at ARLIS/NA
Program Officer Dennis Massie hosts this annual OCLC Research Library Partnership Roundtable at the 41st annual conference of the Art Libraries Society of North America (ARLIS/NA) in Pasadena, CA. More...
Transformation of the Academic Library Presentation by Kurt De Belder
In this OCLC Research Distinguished Seminar Series presentation, Kurt De Belder discusses the fundamental transformation of the academic (research/university) library and explores the necessary changes that need to be made for the library to remain an effective and relevant partner in research and teaching. Tweet: #ordss More...
Managing Research Data—from Goals to Reality Webinar
Originally conducted exclusively for OCLC Research Library Partners, this webinar provides examples of how some OCLC Research Library Partner colleagues are managing research data—the raw output of research investigations, not the resulting reports—including the context in which they are handling their goals, current activities and plans, as well as demonstrations of the systems they are developing. Tweet: #orlp More...
ArchiveGrid and Related Work Webinar
This webinar provides an overview of ArchiveGrid, a collection of nearly two million archival material descriptions that is now freely available from OCLC Research, as well as related work. Tweet: #archivegrid More...
Past Forward! Meeting Stakeholder Needs in 21st Century Special Collections
Originally conducted exclusively for OCLC Research Library Partners, this meeting covers topics related to managing special collections in the 21st century. Tweet: #pastfor More...
Shifts in Scholarly Attention Among World Regions Webinar (Session 1 of OCLC Research Briefing at UNC Chapel Hill)
In this webinar, Dr. Charles Kurzman, Professor of Sociology, University of North Carolina, Chapel Hill presents his research on changing academic attention to world regions over the past 50 years. Tweet: #oclcr #insightseries More...
Why Google?: "…[Google] saved time, it saved gas, I got what I needed, and it wasn't a big deal." (Session 2 of OCLC Research Briefing at UNC Chapel Hill)
In this webinar, Dr. Lynn Silipigni Connaway discusses results of multiple user behavior studies and recommendations for promoting user engagement with library services, sources, and systems. Tweet: #oclcr More...
"How to Read Millions of Books" Presentation by Jean-Baptiste Michel
In this OCLC Research Distinguished Seminar Series presentation, Jean-Baptiste Michel discusses how his work in Culturomics led to the creation of the Google Ngram Viewer. Tweet: #ordss More...
MOOCs and Libraries: The Good, the Bad and the Ugly
This event focuses on the challenges that MOOCs pose to the traditional delivery of library services, and the opportunities they offer for libraries to rethink and revitalize their proposition. Tweet: #mooclib More...
A complete list of OCLC Research events is available online at: http://www.oclc.org/research/events.html.

OCLC Researcher Spotlight—Tip House: Disruptive Innovation and WorldCat Live!

Tip House

I wrote my first program a kid in the 1960's for kit a computer that had mechanical gates and 4 bits of main memory. I positioned various shunts before rolling marbles through the gates, to get it add two two-bit numbers together. I programmed the first real computer I was able to get time on to play games—Life (from Martin Gardner's "Mathematical Games" column), three-dimensional tic-tac-toe, checkers, etc. My goal was to make programs that seemed intelligent, in that they could out-play me. 

After a few years I was able to land a series of jobs at the Ohio State University writing software for the low-temperature physics and human perception labs. While these seriously reduced the time I had for my own programming, they opened doors onto other interesting areas, and ultimately allowed me to join OCLC for the first time in 1980, where my first assignment was conversion of the Union Catalog (now known as WorldCat) to new cataloging rules (AACR2). 

I have remained fascinated with attacking hard problems like machine intelligence, particularly in ways that have no chance of completely succeeding, yet provide useful and cost-effective partial solutions along the way. This turns out to be an extremely productive paradigm for 'ordinary' problems as well. Perhaps the best example is OCLC's Find Text Search Engine, which I wrote in 2004 to index a few thousand ISO 9001 documents, and now provides searching for over two billion metadata and full text records, underlying all OCLC products and services.

Recently I have been using this approach on problems to create disruptive innovation as part of the OCLC Innovation Lab,  creating extremely low-cost versions of traditional systems (e.g., a fully functional ILS for very small public libraries: OCLC WSSL),  "instant" systems developed in very short time periods (e.g., OCLC Article Exchange), systems that automate tasks generally thought to require human cognition (e.g., the ask4stuff synthetic Twitter personality), etc.

The WorldCat Live! API falls mostly into the "instant" systems category, where the goal is to provide services more or less instantly, to encourage the flow of ideas by providing a kind of "magic wand" to idea generators. The idea for this service originated in the library development community, and the service was up and running for them within a week. Not instant, of course, but most of the elapsed time was in routing the request internally: the actual initial development was completed in under 24 hours. This very rapid turn-around did foster follow-on work, both in the developer community and within OCLC, leading to the robust and useful visualization user interface now available.

Tip House is Chief Architect in the OCLC Innovation Lab. Follow him on Twitter at @tiphouse.