Issue 5 : Fourth Quarter : April–June 2012

OCLC 754109685 : ISSN 2163-8675

Share Now

Guest Message from Jim Michalko

Jim MichalkoOCLC Research invests itself in the three general ways shown in the box to the right. Much of our capacity and credibility to do advanced development, community research and effective engagement comes from the long-standing service and support capabilities OCLC has delivered. However, because of a cluster of special staff skills and leadership we have made a significant impact around mobilizing unique materials, that is, special collections and archival materials even without a deep connection to service delivery.

These types of collections have gotten increasing attention from library and institutional leaders. They are often being identified as the collections that will define an academic library’s future distinctiveness. These materials are certainly important institutional assets that need to be described, discovered and used in both teaching and research. Traditionally they have also been expensive to describe, difficult to find, costly to steward, and incompletely mobilized to support the institution’s mission.

The work we’ve done in this arena has attempted to influence the entire ownership arc of unique materials beginning with assessment, selection, description, discovery, digitization and use to make it less expensive and more effective. I’m particularly pleased to call this to your attention now because of the significant contributions my colleagues recently made at the Rare Books and Manuscripts Preconference and the attention given to aligning special collections with the institutional mission during the OCLC Research Library Partnership Libraries Rebound conference. The presentations there focused on how special collections are initiating a broad range of activities on campus and beyond to sustain and redirect their future.

While OCLC does not provide special support for archival description, in Research we’ve been challenging some traditional practices and experimenting with processes that more effectively connect descriptive practices with discovery. The ArchiveGrid beta system is both an effective aggregator of information about archival collections while also serving as a real-world discovery environment where new approaches can be tested and evidence to guide local effort can be gathered. If they are not already, get your archival collections represented.

We’ll be talking about our work and future directions with our colleagues in the archival community at the upcoming annual Society of American Archivists meeting. We’ll let you know how to connect with us there.

Regards, Jim Michalko


Ricky ErwayRicky Erway
Lasting Impact: Sustainability of Disciplinary Repositories

This report offers a quick environmental scan of the repository landscape and then focuses on disciplinary repositories--those subject-based, often researcher-initiated loci for research information.

Ixchel FanielIxchel Faniel (and others)

Managing fixity and fluidity in data repositories

In iConference'12: Proceedings of the 2012 iConference, edited by Jens-Erik Mai. New York: ACM (2012). This paper won a "Best Paper" award at iConference 2012.

Carol Jean GodbyCarol Jean Godby
A Crosswalk from ONIX Version 3.0 for Books to MARC 21

This report describes the crosswalk developed at OCLC for mapping the bibliographic elements defined in Version 3.0 of ONIX for Books to MARC 21 with AACR2 encoding.

Karen Smith-YoshimuraKaren Smith-Yoshimura
Social Metadata for Libraries, Archives, and Museums. Part 3: Recommendations and Readings

This third report completes the Social Metadata for Libraries, Archives, and Museums report series.

Karen Smith-YoshimuraKaren Smith-Yoshimura
Social Metadata for Libraries, Archives, and Museums: Executive Summary

The executive summary provides a high-level overview of all three reports in the Social Metadata for Libraries, Archives, and Museums report series.

Prototypes and Services

OCLC Research: 3 Roles

OCLC Research performs three major roles: we act as a community resource for shared R&D, provide advanced development and technical support within OCLC itself, and enhance OCLC's engagement with members and mobilize the community around shared concerns.


ArchiveGrid is a discovery service that provides access to detailed archival collection descriptions. It includes over a million descriptions of archival collections held by thousands of libraries, museums, historical societies and archives worldwide. ArchiveGrid enables researchers to learn about the contents of these collections, contact archives to arrange a visit to examine materials or order copies—all from one simple, intuitive search. Although ArchiveGrid is currently available as a subscription service, it will eventually become a free discovery system. To facilitate this transition, OCLC Research is developing a new ArchiveGrid discovery interface that is now freely available

OCLC Research's WorldCat Identities prototype

WorldCat Identites has a summary page for every name in WorldCat (currently some 30 million names). We maintain a Research version in addition to the pages that are available via We create the pages used by both services and we plan to experiment with allowing splits and merges of Identity pages in our version of the service. We create quarterly updates of Identity pages.


New Video: "Linking Library Data to Wikipedia"
In this video, OCLC Research Wikipedian in Residence Max Klein and Senior Program Officer Merrilee Proffitt discuss a project aimed at enhancing name disambiguation in Wikipedia by establishing reciprocal links with Virtual International Authority File (VIAF) records. More...
OCLC Research and Wikimania to Host Wikipedia Loves Libraries Event on 11 July in Washington, D.C.
OCLC Research Wikipedian in Residence Max Klein organized this event to continue to build momentum for the Wikipedia Loves Libraries initiative, a continent-wide campaign to bring Wikipedia and libraries together with on-site events. More...
Brian Lavoie Elected to Dryad Data Repository Board of Directors
The group, which manages data underlying peer-reviewed bioscience articles, is becoming a non-profit organization in the US. More...
Joint Partnership between The University of Oxford and OCLC Research Wins Continued JISC Funding for Study of Digital Visitors and Residents
JISC, the UK's expert on information and digital technologies for education and research, has agreed to continue funding a third phase of "Visitors and Residents: What Motivates Engagement with the Digital Information Environment?" a UK-US partnership between the University of Oxford and OCLC Online Computer Library Center, Inc., in collaboration with the University of North Carolina, Charlotte. More...
New Video: "Interview with OCLC Research Wikipedian in Residence Max Klein"
In this eight-minute video, Senior Program Officer Roy Tennant talks with OCLC Research Wikipedian in Residence Max Klein about his plans to help connect researchers with library collections and services using Wikipedia. More...
Max Klein Named OCLC Research Wikipedian in Residence
OCLC Research is pleased to announce that Max Klein has been appointed to this paid, three-month position in our San Mateo, California office until the end of August 2012. More...
OCLC Researchers and Colleagues Garner Award for RUSQ Article
"'Are We Getting Warmer?': Query Clarification in Live Chat Virtual Reference" has won RUSA's 2012 Reference Service Press Award. More...
Newly Published Report Highlights 2011 Work, Engagements of OCLC Research
This report presents our work in a new context and provides an overview of recent accomplishments. More...
Shenghui Wang Named OCLC Research Scientist
OCLC Research is pleased to announce the appointment of Shenghui Wang, Ph.D., as a Research Scientist effective 1 May 2012. Dr. Wang will work from the OCLC European, Middle East and Africa (EMEA) headquarters office in Leiden, Netherlands. More...
VIAF (Virtual International Authority File) Transitions from OCLC Research Prototype to OCLC Service
OCLC will continue to make VIAF openly accessible and will also work to incorporate VIAF into various OCLC services. More...

Events, Webcasts, and Presentations

OCLC Research and Wikimania Hosted Wikipedia Loves Libraries Event, 11 July (Washington, D.C.)
OCLC Research Wikipedian in Residence Max Klein organized this event to continue to build momentum for the Wikipedia Loves Libraries initiative, a continent-wide campaign to bring Wikipedia and libraries together with on-site events. More...
OCLC Research Staff Presentations at ALA Annual 2012, 21-26 June (Anaheim, CA)
The theme of this year's ALA Annual conference and exhibition was Transforming Our Libraries, Ourselves. More...
OCLC Research Staff Presentations at the 53rd Annual RBMS Preconference, 19-22 June (San Diego, CA)
The 2012 RBMS Preconference FUTURES! was designed to explore a multiplicity of futures for the rare book, manuscript, and special collections community. More...
OCLC Research Presentations at Libraries in the Digital Age (LIDA)
Senior Research Scientist Lynn Silipigni Connaway made Two Presentations at Part I of the conference in Zadar (Croatia). More...
OCLC Research Senior Program Officer Roy Tennant to Present Keynote at Academic Librarians 2012 Conference
Roy will present "The Once and Future Academic Library" from 8-9 a.m. on Wednesday, 13 June 2012 at Syracuse University in New York. More...
Behind the Research: Interview with OCLC Research Scientist Jean Godby Inaugurates new Series on OCLC Cooperative Blog
This series takes the opportunity to get to know a bit about the researchers themselves--what motivated them to get into the profession, what drives their curiosity, what inspires them. More...
Search Engine Optimization (SEO) for Institutional Repositories Webinar Recording Available
This webinar provided SEO techniques for improving the indexing ratios of institutional repositories in Google Scholar. More...

Looking Beyond the Quarter...

OCLC Research Webinar: "Wikipedia and Libraries: What's the Connection?" (#orwikipedia)
31 July 2012
Online via WebEx
OCLC Research TAI CHI Webinar: Umlaut (#orumlaut)
1 August 2012
Online via WebEx

OCLC Researcher Spotlight—Carol Jean Godby and Language Processing at OCLC Research

Carol Jean GodbyWhen I was a graduate student in linguistics at Ohio State in the early 1980s, I wasn’t too far removed from Noam Chomsky's seminal work published in the 1950s and 60s that introduced important concepts to linguistics as well as the fledgling discipline of computer science. He worked out the mathematics that answer questions about what makes grammars simple or complex and what makes human language different from formal logic and computer languages. Chomsky and his students so revolutionized the study of linguistics that thousands of his students and grand-students (me, among them) could spend the next sixty years studying language as an empirical discipline with a strong conceptual framework that can be analyzed with computational rigor.

Nearly all of my work at OCLC has had a linguistic underpinning. I have led projects on vocabulary extraction in full text, prompted by the linguist’s question: what is a word, and how is that different from a phrase? In other words, how do we algorithmically pluck vocabulary from streams of text that might be useful to record in a dictionary or terminology list? And how do we map it to controlled vocabularies managed by librarians? Name extraction is a special and particularly important case of vocabulary extraction. It's not always easy to automatically recognize and extract useful names of people, places, groups, organizations, etc. from text. OCLC or Google might be easy-to-recognize names, but how on earth is a computer supposed to recognize “Paul Milstein Division of United States History, Local History and Genealogy,” a business unit of the New York Public Library, as the name of a single ‘thing’? When we can successfully carry out these tasks, though, we can provide a whole new set of tools for discovery. Our work on names and authority files here at OCLC has led, among other things, to WorldCat Identities, which provides a way to search for works by and about named entities.

My work on metadata mapping is also a linguistic problem. It is prompted by the linguist’s questions about the nature of translation. Are two expressions ever truly synonymous? And why do multiple 'languages' exist in the first place? The solution encoded in the Crosswalk model, to which our recent work on mapping ONIX to MARC is an input, is to create a "bilingual dictionary" into which library metadata standards can be mapped. Once the concepts have been translated, they are shipped off to other software processes that express them in the native format specified by the standard. In linguistic terms, we solve the problem by separating syntax from semantics – which Noam Chomsky taught both linguists and computer scientists how to do.

My current interest in conceptual modeling is also a linguistic problem. Stepping back from the Crosswalk work, my colleagues and I are now asking what is a title, an item, a copy, a holding, an institution, a library, or a vendor? How do they relate to each other? How have they evolved over time? And how can we ensure that our software processes reflect the same understanding of these important concepts? This work is important for ensuring consistency and transparency of OCLC's products and services.

I like to think that we start with the same questions Chomsky did. What concepts do our systems rely on? How can we model them in ways that allow for better communication between them? How do we start with complex, analytical models but then deliver a coherent view that's useful to our communities of users?