OCLC Research Quarterly Highlights
Issue 12: Third Quarter: January-March 2014
OCLC 754109685: ISSN 2163-8675

A glimpse of where we're going with works and other entities

A message from Lorcan Dempsey

Lorcan Dempsey

OCLC recently released a preview of WorldCat Works. This is a linked data view of the works in WorldCat. It is the result of a great amount of detailed collaborative work across research, product and engineering groups in OCLC. It is especially interesting for several reasons. One is that it represents the coming together of a lot of work over the years in OCLC Research to develop a robust view of works and how to generate them algorithmically. Thom Hickey has a nice overview of some of this work here. Another is that it was the outcome of ongoing development of a data model for bibliographic entities in Schema.org, extending it as required for our purposes. And yet another is that as one instantiation of 'FRBRized' WorldCat, it is an example of how programmatic or algorithmic approaches are becoming much more common.

This last point is an especially important one. As we do more work in Hadoop, and other big data infrastructure, it has become more feasible to do large amounts of processing quickly. And as Stalin is reported to have said, quantity has a quality all of its own. What this means in practice is that it is possible economically to manipulate large amounts of data more readily. And scale matters. The more data there is the more that can be done with it. So Thom provides some examples.

As the worksets become larger and more reliable we are finding many uses for them, not the least in improving the work-level clustering itself. We find the clustering helps find variations in names, which in turn helps find title variations. We are also learning how to connect our manifestation and expression level clustering with our work-level algorithms, improving both.  The  Multilingual WorldCat work reported  here is also an exciting development growing out of this.

Finally, our works work is one part of a broader interest in the 'entification' of bibliographic data. We currently have a record-based view of data—we carry packages of data about titles around the place. We are interested in gradually moving towards more of an entity-based view of the data, where we manage data about works, people, subjects, and so on, independently and create links between them and between them and other entities. Our ongoing work with VIAF is an example of the people entity. FAST contributes to a subject entity. To see an early example in the data, look at data for the work The Heather Blazing. This shows links to subject entities (FAST and LCSH) and to names entities (VIAF). In this way we are contributing to a knowledge graph (or bibliographic graph) tying together rich network information about works and people.

WorldCat Works is provided with an experimental viewer. For another way in, check out Classify which provides a work URI in its results.

It is exciting to see this work come together in this way, providing a glimpse into where we are going.

Lorcan Dempsey is Vice President, OCLC Research, and Chief Strategist


Constance Malpas and Brian Lavoie
Constance Malpas and Brian Lavoie
Right-scaling Stewardship: A Multi-scale Perspective on Cooperative Print Management
This report explores the regional-scale cooperative print strategies in the context of a local collection (OSU) participating in a regionally-scaled consortia (CIC) shared print initiative.



Jennifer Schaffner
Jennifer Schaffner and Ricky Erway
Does Every Research Library Need a Digital Humanities Center?
This essay argues that library directors can engage with digital humanities along a continuum of investment. The most important point is that digital humanists are fiercely independent.




Lynn Connaway
Lynn Silipigni Connaway
Evaluating Digital Services: A Visitors and Residents Approach

This infoKit contains findings, outputs and video interviews of the co-principal investigators from two years of investigation of the collaborative longitudinal Jisc-funded Visitors and Residents study between Jisc, the University of Oxford and OCLC Research, and in partnership with the University of North Carolina, Chalotte. It contains advice on evaluating the services libraries offer to their users. The focus is primarily on digital/online services but set within the broader context of more traditional services, exploring the relationship between the two.


A complete list of OCLC Research publications is available online at: http://www.oclc.org/research/publications.html.

Share Now

Prototypes and Services

WorldCat Identities Network This project gives users the opportunity to visually explore the interconnectivity and relationships between WorldCat Identities.


searchFAST—This new interface to the FAST prototype simplifies the process of heading selection, in an easy-to-use one-page design.


OCLC Research identifies top ten alien abduction items in libraries in honor of Extraterrestrial Abduction Day
In acknowledgement of Extraterrestrial Abduction Day, observed on 20 March, we identified the top ten most widely held alien abduction item in libraries. This list was generated from the OCLC WorldCat database on 19 March based on the number of OCLC member libraries that hold items with the subject heading with "alien abduction."   More...
OCLC Research Library Partnership welcomes three new Partners: one from the U.S. and two from Australia
We were pleased to welcome Montana State University, Australian National University, and Monash University to the OCLC Research Library Partnership in March.   More...
Syriac Reference Portal contributes names to VIAF
The second set of personal names from a scholarly resource, the Syriac Reference Portal hosted by Vanderbilt University, was loaded into the Virtual International Authority File (VIAF) in March.  More...
Challenges of integrating researchers in authority files outlined in presentation and draft report
Karen Smith-Yoshimura and Micah Altman's (MIT Libraries) slides from their presentation at the CNI Spring 2014 Membership Meeting are available online.  More...
Brian Lavoie discusses framework for thinking about the evolving scholarly record in new presentation
Slides from Brian Lavoie’s "The Evolving Scholarly Record: Scope, Stakeholders and Stewardship" presentation at the CNI Spring 2014 Membership Meeting are available for downloading or viewing on SlideShare.  More...
Video of Ricky Erway's university-wide data policy planning presentation at CNI Fall Meeting available
In this presentation, Ricky suggests that universities adapt a proactive approach in developing a high-level policy for responsible data planning and management. More...
OCLC Research identifies top 10 love stories in libraries in honor of Valentine's Day
This list of the top 10 most widely held books and movies was generated from WorldCat based on the number of OCLC member libraries that own at least one copy of the given book or movie (holdings).  More...
OCLC Research and ALISE name 2014 research grant recipients
OCLC Research and ALISE have awarded research grants to Denise Agosto of Drexel University and June Abbas of the University of Oklahoma; Leanne Bowler, Daqing He, and Jung Sun Oh of the University of Pittsburgh; and Lynne (E.F.) McKechnie of the University of Western Ontario.  More...
2013 OCLC/ALISE LISRGP grant recipients present projects and findings at ALISE Virtual Conference 2014
OCLC Senior Research Scientist Lynn Silipigni Connaway, Ph.D., moderated this session on 24 January at the Double Tree Philadelphia City Center hotel in Philadelphia, Pennsylvania.  More...
Lynn Silipigni Connaway appointed to Chair of Excellence program at Universidad Carlos III de Madrid
This six-month appointment will expand OCLC Research's capability for innovation, and further facilitate international collaboration between OCLC Research and library and information science professionals and scholars in Europe.  More...
WorldCat analysis identifies most common English title words for books, movies and other media
OCLC Research Senior Program Officer Roy Tennant used WorldCat, the world's largest online database for discovery of library resources, to identify the most common English words in titles of books, movies and other media. The results were published in The Atlantic article, "In Books, Movies, and Media, the Most Popular Title Word Is 'New'", on 8 January 2014.  More...
A complete list of OCLC Research news items is available online at: http://www.oclc.org/research/news.html.

Events, Webinars and Presentations

Regional Print Management: Right-Scaling Solutions Symposium
At this symposium, library managers, collection development officers, consortium administrators and others interested in shared print were invited and encouraged to attend this 27 March event in person or online, co-sponsored by OCLC Research, the CIC and The Ohio State University Library, with support from OhioLINK.   More....
Merrilee Proffitt presented keynote at Libraries, MOOCs and Online Learning event in Australia on 19 March
Senior Program Officer Merrilee  Proffitt presented "MOOCs and beyond: online education and libraries, what is happening in the field" at the State Library of Queensland Auditorium in South Brisbane, Queensland.  More...
"Inside the Digital Public Library of America" Presentation by Dan Cohen
In this OCLC Research Distinguished Seminar Series presentation on 7 March, Dan Cohen went behind the scenes to discuss how the DPLA was created, how it functions as a portal and platform, what the staff is currently working on, and what's to come for the young project and organization.  More...
Constance Malpas' keynote addressed the future of academic print management in Japan
OCLC Research Program Officer Constance Malpas presented  the keynote, "Many paths, one moon" at the Future of Print Management in Japanese University Libraries forum at the Keio University Mita Media Center in Tokyo, Japan on 28 February.   More....


OCLC Research Library Partnership San Francisco Bay Area Reception
Staff from OCLC Research Library Partner institutions were invited to attend a reception on 26 February for a chance to socialize with colleagues, and an opportunity to chat with OCLC Research staff and catch up on some projects underway. Informal overviews of these current projects were presented:  Registering Researchers in Authority Files, ArchiveGrid, Wikipedia and Libraries, and Gleaning Insights from Mining MARC Data.   More...
The Wikipedia Library Project--what is it, and how can you be involved?
Exclusively for OCLC Research Library Partners, this invitation-only webinar hosted by OCLC and Wikipedia on 25 February was held to discuss Partner participation in a pilot project in which Wikipedia seeks libraries to host a Wikipedia editor and give that editor access to their library materials in order to enhance the article citation process on Wikipedia. The cooperative's goal for this project is to make the library's e-collections available online via the Worldcat knowledge base, so that students and others on campus can see links in Wikipedia to full-text articles that the library makes available.  More...
Beyond EAD: Tools for Creating and Editing EAC-CPF Records and "Remixing" Archival Metadata
This webinar on 9 January featured demonstrations of xEAC and RAMP, two tools that will help archivists and librarians explore new possibilities for name authority work, moving beyond the boundaries of traditional archival metadata.  More...
Lorcan Dempsey presented keynote at Offline Conference in Montana
OCLC Vice President, Research, and Chief Strategist Lorcan Dempsey presented the keynote "Scale, engagement, innovation: library directions" at the Montana Library Association's Offline Conference on 7 February 2014 in Helena, Montana.  More...
A complete list of OCLC Research events is available online at: http://www.oclc.org/research/events.html.

OCLC Researcher Spotlight—Roy Tennant: Technology and Infrastructure  

Roy Tennant

I’ve focused on library technology and infrastructure issues from before I became a librarian. As a library assistant at a community college in the Sierra Nevada Mountains of California, I spent weekdays running an audio-visual department and weekends guiding for O.A.R.S and other commercial whitewater companies.

In the early 1980s, when I decided to go to library school, I had only a year’s worth of credits. Since microcomputers were just becoming popular and I had written my first software program (an interactive library orientation written in BASIC), I considered Computer Science as a major. But at the time nearly all university computer labs were sequestered in basements and the idea of spending hours underground annoyed this outdoorsman. Instead I majored in Geography and minored in Computer Science.

After being awarded an MLIS from UC Berkeley in 1986, it was clear that the Internet was going to be important – perhaps even transformative – for libraries. By the Fall of 1992 I had co-authored the first book about the Internet for librarians (Crossing the Internet Threshold: An Instructional Handbook).

Since I have spent my professional career predicting and proselytizing major technological and infrastructural changes in libraries, I have tried to retain that focus at OCLC Research. I believe that one of the biggest transitions libraries will weather in the coming years will be changing our bibliographic infrastructure. It is now widely recognized that what we have is no longer sufficient in a world of increasingly interconnected software applications and datastores.

To successfully transition we must understand what we have at hand. After 50 years of encoding bibliographic data, what have we captured and how? What problems may we encounter when moving our data from legacy formats to a completely different technical environment? Where have we been inconsistent? These are just some of the questions I seek to answer by exposing data that only OCLC could provide.

Over the last year this has been accomplished in a series of reports that cover the entire WorldCat database, at the MARC Usage in WorldCat web site. For all MARC fields and subfields total counts are provided, and for specific subfields, the contents are summarized and ordered in reverse numerical order.

Our most foundational technology is our bibliographic infrastructure. If and when we move into a brave new bibliographic world we should do it with all of the knowledge and skill we can bring to bear upon the task.

Roy Tennant is a Senior Program Officer in OCLC Research.