530 million songs. 90 years of high-definition video. 250,000 Libraries of Congress. That’s how much data we produce every day—2.5 exabytes according to Northeastern University. I guess that’s not surprising, given the amount of activity that goes on in social media, websites, email messages and texting.
Much of that data, though, is personal and ephemeral. Videos, photos, tweets and stories that can be passed along and deleted without any thought or care about accuracy or archiving.
But in the scholarly community, a similar and perhaps more significant explosion of digital data is occurring. Here the stakes may be much higher. Without trusted stewardship, data from research will not be effectively collected and preserved for reuse. And when this happens, research innovation and advancement slows significantly.
This is new territory in many ways. Data have been collected and preserved for thousands of years, but never at the volume we see today, nor with some of the deliberate (and in some cases, legally mandated) intentions for reuse.
Given their expertise, library and archives professionals are well-suited to provide support and many have taken a leading role in developing new ways to serve their campus communities’ needs to manage, curate and preserve research data. It’s an exciting opportunity and one for which our community is well-equipped.
Managing and curating data with reuse in mind
For almost 10 years, I’ve been studying data reuse in academic communities. It has evolved into studying academics’ data management and sharing practices and the library’s role in supporting these kinds of activities. My latest research focuses on data sharing and reuse in the social science, archaeological and zoological communities, and what these traditions and practices mean for repository data curation. My goal is to identify the common drivers and unique elements of sharing and reuse in each discipline, and what applications those might have for library and archives professionals in their role as data curators.
What we found throws some light on how librarians and archivists can play a more active role in this new environment:
- Trust in both the data and the repository plays a major role in whether data are reused. Perceptions and opinions about documentation quality, data producer reputation and repository reputation are formed over time as researchers gain experience with the discipline, data and repository. As data curators, librarians and archivists have the power to shape researchers’ perceptions and opinions about these trust markers, particularly repository reputation and documentation quality.
- Important mediating roles occur between data production and reuse. Engaging with researchers who are producing as well as reusing data provides a full view of upstream and downstream data needs and the opportunity to better align support and services, such as data deposit and curation activities, user interface design, data management instruction modules or other scaffolding to improve researchers’ experiences. In addition, intervening at the point of data production would allow for a negotiation of curation goals that serve to better satisfy the needs of multiple stakeholders—data producers, curators and reusers.
- Data repositories do not have to house all of the information about the data to be effective. However, they do have to provide provenance information or chronology of ownership, chain of custody information and location if the information or related data is housed elsewhere. We see this within the zoological community where decisions about data stewardship and data services are made across multiple repositories resulting in partnerships among several institutions. Seeking partnerships that extend institutional capabilities adds value to the research community.
In each case, what we see is that trained, engaged people making connections with others plays a key role in the success of these projects. And to me, that sounds a lot like the work librarians and archivists do.
The role of the library and the archives in aggregating and servicing these assets is increasing. Our challenge is to provide new services that respond to the abundant and sometimes chaotic flow of digital data and the evolving patterns of data sharing and data reuse.