With big data, answers drive questions

Andreas Schmidt


Usually, when we search for a solution, we start with a question and then seek out answers. According to Viktor Mayer-Schönberger, one of the plenary speakers at the 2017 OCLC EMEA Regional Council Meeting in Berlin, big data flips that equation on its head.

Tying into the event’s theme, “Libraries at the Crossroads: Resolving Identities,” Viktor explained that big data is all about gaining new perspectives on the world. It is revolutionizing what we see and how we process information. And he explained that with big data, we start with answers—what the data tells us—and then go back to fill in appropriate questions and hypotheses.

As a Professor at Oxford University’s Internet Institute and author of Big Data: A Revolution That Will Transform How We Live, Work, and Think, Viktor also explained that every additional data point is an opportunity to boost customer services and find new synergies. He talked about the quantity of big data translating into a new capability to make sense of patterns.

As I thought about his presentation, I wondered about the impact of big data on libraries. In our own way, we librarians have been big data crunchers for decades. We’ve made great strides in collecting bibliographic data at scale. So how do we move these efforts forward?

Positioning libraries for big data success

Big data has made processing large collections of data inexpensive and fast. It provides the ability for forward-looking decision-making based on data from multiple, disparate data sources.

Some recent opportunities include:

Curating research data. University researchers and government agencies manage and preserve massive digital assets—images, text and data—that require integrated management and preservation programs. These data include project proposals, grant proposals, researcher notes, researcher profiles, datasets, experiment results, article drafts and copies of published articles. The library’s role in connecting and curating these institutional assets is needed and a big opportunity for new services. OCLC Research scientists are exploring topics related to data curation and libraries with an eye toward distinctive services that will support research missions.

Aggregating library data. We are leveraging members’ collected knowledge investment for efficiency and re-use by libraries and other organizations. One example is the Virtual International Authority File (VIAF), which virtually combines multiple name authority files into a single dataset. By linking disparate names for the same person or organization, VIAF provides a convenient means for a wider community of libraries and other agencies to repurpose bibliographic data produced by libraries that serve different language communities. VIAF became an OCLC service in 2012 and today, 25 national libraries from 30 countries are represented in the cooperative data file.

Managing collection data. As libraries move from locally owned to jointly managed print collections, good data about collections can help establish priorities and focus. When aggregated and analyzed across many libraries (through programs such as Sustainable Collections Services), collections data can suggest patterns and provide insights that inform management decisions. We anticipate that a large part of existing print collections, spread across many libraries, will move into coordinated or shared management within a few years. While quantitative data must be used carefully, information about overlap and usage can supplement the judgment of librarians.

Getting ready for the future

The “Crossroads” theme of the conference was woven through many of the presentations, discussions and conversations I heard. But big data cuts across many of the topics presented, such as issues of digitization, research information management and institutional identities.

Library services will clearly be increasingly affected by big data—but here’s a thought-provoking question: Will the data be our own, or that which comes from an increasingly connected and monitored world? Will we be able to collect data from thousands of institutions in ways that present answers for which we can formulate library-specific questions? Or will we be stuck trying to adjust our inquiries and plans based on data collected elsewhere?

We are still in the early days of aggregating all sorts of new and exciting library data. Indeed, library big data might play a crucial role in framing questions about education, authority and literacy outside the spheres of commercial interest—if we can successfully navigate these crossroads together.