The catalog is out of the box
By Andy Havens and Tom Storey
For hundreds of years, metadata was kept in a box. Literally. A wooden box, filled with paper cards. Libraries cataloged for one reason: to be able to find resources on a shelf. Today, though, we’re seeing a growing importance placed on metadata management activities. In an increasingly information-driven world, good metadata is the key to more than finding the right item.
Data-about-data is now used to track materials, assess needs, compare collections, inform research, manage workflows, plan budgets and even make friends. Catalogers have been joined by publishers, retail outlets, shipping companies, researchers, faculty, Web programmers, search engine optimizers and end users in the flow of metadata creation and modification. This puts libraries, and catalogers, right in the middle of a revolution in how we think about representing and describing information. And the more partners we can involve in these processes, the more chances libraries have to add value up and down a variety of data supply chains.
“Metadata has become a stand-in for place.”
So says Richard Amelung, Associate Director at the Saint Louis University Law Library. When asked to expand on that idea he explains, “Law is almost entirely jurisdictional. You need to know where a decision occurred or a law was changed to understand if it has any relevance to your subject.
“In the old days, you would walk the stacks in the law library and look at the sections for U.S. law, international law, various state law publications, etc. Online? Without metadata, you may have no idea where something is from. Good cataloging isn’t just a ‘nice-to-have’ for legal reference online. It’s a requirement.”
Richard’s point is one example of a trend that is being felt across all aspects of information services, both on and off the Web: the increasing importance and ubiquity of metadata. In a world where more and more people, systems, places and even objects are digitally connected, the ability to differentiate “signal from noise” is fast becoming a core competency for many businesses and institutions.
Librarians—and catalogers more specifically—are deeply familiar with the role good metadata creation plays in any information system. As part of this revolution, industries are increasing the value they place on talents and the ways in which librarians work, extending the ever-growing sphere of interested players.
Whether we are tracing connections on LinkedIn, getting recommendations from Netflix, trying to find the right medical specialist in a particular city or monitoring a shipment online, metadata has become
the structure on which we’re building information services. And no one has more experience with those structures than catalogers.
The value of metadata in medicine
Preventing blindness is Dr. John Michon’s passion. As a practicing ophthalmologist and a medical researcher, he has studied and seen firsthand the devastating effects of eye disease.
And he knows that to eradicate vision loss, the clinical record of patient care must be linked online with the huge datasets emerging from gene-mapping projects and other research activities in order to
create new associations and new knowledge that doctors can act upon.
That’s where librarians come in, he says.
“The role of library and other information scientists is crucial to the success of this effort,” Dr. Michon says. “Physicians, allied health workers and researchers are generally naïve when it comes to classification and categorization issues. We’re too busy with our primary duties. Creating, implementing and testing knowledge models for the large and diverse number of biomedical domains will be a cooperative process between librarians and domain experts.”
Dr. Michon’s thoughts highlight a trend sweeping across the information community as people and communities are deluged with digital data: the growing importance of metadata and the critical role librarians are playing in making information systems better. Of course, the importance of knowledge organization models and standardized description are nothing new to our profession. Librarians have long been leaders in designing classification systems, dating back to 1876, when Melvil Dewey first published the Dewey Decimal Classification system.
Nowhere does this take on the magnitude of importance as it does in the medical field, where new information could be used to treat disease and advance life-saving research.
With a grant from the National Science Foundation, Dr. Michon built a prototype biomedical information infrastructure for visual sciences to help doctors integrate research data with clinical data in order to better predict disease risk and make recommendations for specific therapies. During the project, he found out how critical a librarian’s expertise is to the effort.
In order to make the rapidly growing amount of information accessible and meaningful, medical experts needed to agree on naming conventions and relationships of essential concepts—essentially content value standards—and commit to categories representing the information system’s architecture—the equivalent of data structure standards.
“Information organization, storage and retrieval are facilitated through the use of metadata and the ability to make ‘computable’ statements,” Dr. Michon says. “As librarians become more involved with biomedical information, it is important for them—particularly catalogers—to be part of the teams that organize and improve the utility of our data and develop a high level of interoperable biomedical infrastructure.
“Understanding the principles of knowledge modeling, the tool sets available for this work, and codifying expert knowledge will challenge librarians and information scientists and demand that they learn a fair amount of biomedicine language.
“However, the results will justify the efforts if we can capture more of the value inherent in biomedical information and use it to improve human welfare.”
Using metadata to drive scientific data integration and analysis
Jane Greenberg, Professor and Director, Metadata Research Center, School of Information and Library Science, University of North Carolina at Chapel Hill, says it’s a very exciting time to be involved with cataloging and metadata.
“People are getting wind of the fact that librarians are the experts,” she says. “There are a lot of partnerships being formed and people are looking to librarians for information standards and how to manage data. Never in our time has there been a more universal interest in producing structured, standardized information.”
Jane was approached by researchers from evolutionary biology who were building a digital repository called Dryad to archive data and publish findings in evolutionary biology, ecology and related fields. The repository allows scientists to access and build on each other’s findings.
“They asked me if I knew anything about the MARC format and Dublin Core,” she says. “In fact, they said they needed bibliographic control. These biologists actually used the words bibliographic control. It was pretty amazing!
“Their depth of knowledge and command of detail at the scientific level was extremely impressive and they realized at the top level they needed some kind of information standards. This is happening across the board.”
Dryad is a repository for digital data in evolutionary biology that seeks to ensure long-term preservation and promote resource discovery and reuse of the data. The focus is on published datasets, with links to major evolutionary biology journals and domain-specific community databases.
Jane, her colleagues at the Metadata Research Center and NESCent (the National Evolutionary Synthesis Center) are also working on a system called HIVE that allows users to annotate Dryad content with subject headings from multiple controlled vocabularies.
The system is being designed to generate subject metadata using automatic metadata generation techniques that pull concepts and terminology from a range of subject thesauri. “An interdisciplinary subject such as evolutionary biology cannot be represented by a single vocabulary,” Jane says. “We want to create a usable and functional system that draws descriptors from several controlled vocabularies to aid catalogers and authors who are creating subject metadata.
“We hope to provide efficient, affordable, interoperable and user-friendly access to controlled vocabularies during metadata creation activities.”
Tracking economically important innovations with book metadata
For Michelle Alexopoulos, Associate Professor, Department of Economics, University of Toronto, metadata isn’t just a way to find particular materials, it’s the key to unlocking entire trends in economics.
Economists believe technical change is responsible for economic growth and a major cause of business cycles. As Michelle says, “Without good measures, we can’t test theories or determine what areas of technology are growing rapidly and where we should invest R&D funds.”
Previously, direct measures of technological change in economics included the tracking of patents and R&D expenditures. And while those methods are helpful, there can be ambiguities. The number of patents, for example, can be affected by changes in patent law. Also, filing a patent is no guarantee that commercial innovation—which is closely linked to economic change—will follow. Even if a new patent does produce significant change, it can be years between the patent filing and the economic impact.
“An ideal indicator of economic change,” Michelle says, “would be available at least yearly, would not be subjective, would be related to the introduction of the new good or process, would weigh technologies according to their importance and would cover all new technologies across industries.”
To meet that challenge, Michelle has turned to book-based indicators of technological change.
“Because new books are required with new technologies, producers first release manuals with initial product shipments. Afterward, publishers release additional ‘how-to’ books as well as those that comment on the new technology. Secondary markets then get into the act in order to maximize profit.”
Luckily, existing classification systems for books already exist and allow for the objective groupings necessary to adequately track innovation. Michelle’s team got catalog metadata from the Library of Congress and WorldCat, and publishing data from publishers lists such as Books In Print, and from booksellers such as Amazon.
“What we found,” she explains, “is that book-based indicators much more closely track the date of commercialization of innovations.” For example, insulin was invented in 1889. It wasn’t commercialized, however, until 1922. The date of publication of books on insulin? You guessed it: 1922.
All new technology titles |
 |
The same holds true for more modern innovations, such as the personal computer. While the first commercial transistor computers were released in the late 1950s, the real change in the economy due to distributed computing occurred simultaneously to the growth in books on the subject—in the 1980s.
Michelle concludes, “These new indicators don’t simply track the diffusion of a product or process. Instead they help explain observed changes in productivity as well as economic activity as measured by GDP. Metadata like this can help us track important waves of innovation, measure the relationship of different technologies to one another, perform cross-country studies and explore quantitative links between science and technology.”
All of which makes the creation and maintenance of metadata even more important. Good cataloging structures don’t just meet today’s information needs, but will be increasingly used as a way of understanding
larger trends and developments.
A bridge between library metadata and the rest of the world
Jean Godby, OCLC Research Scientist, has been involved in several projects that seek to bridge the gap between library metadata and that of other systems. She manages the Metadata Schema Transformations project that seeks to insert interoperability into the management of digital resources.
“The situation we are attempting to deal with,” Jean says, “is defined as ‘schema-level interoperability,’ because we are trying to identify common ground among formally defined metadata schemas.”
Metadata crosswalk services exist already between MARC and Dublin Core, ONIX and MARC, ONIX and Dublin Core, MODS and MARC, and MODS and Dublin Core. These allow, among other things, for better workflows between libraries, publishers and book jobbers. And while easy correspondences between metadata systems are good enough for much of the day-to-day work in libraries, even the slightest incompatibilities can produce backlogs that translate into unfulfilled queries from users.
“The goal,” Jean says, “is a framework that will make the creation and upkeep of metadata and associated workflows easier for all parties. Library metadata can be leveraged in a variety of ways that benefit our users and systems, and provide value to outside agencies as well. Ideally, what we want are library solutions that can be used in a variety of environments.”
As more and more industries and organizations rely on quality metadata, opportunities for libraries to leverage their catalog data will increase. Being able to ‘crosswalk’ metadata from one system to another is one key to libraries’ success in these endeavors.
A metadata renaissance for libraries
In her paper Time Horizon 2020: Library Renaissance, Susan Gibbons, Vice Provost and Dean, River Campus Libraries, University of Rochester, talks about how the coming decade will mark the renaissance of technical services and a complete transformation of collection development.
Among the changes she sees:
-
The emphasis of technical services will change from the acquisition of content to the user’s discovery of content. A library’s success will be defined by whether its users are finding the best materials easily and quickly, rather than by collection metrics. A myriad of services, customized to the library’s local needs, will emerge that will sit on top of a library’s broad print and electronic collections. The success of these services will be dependent upon the availability and quality of metadata.
-
The need for all content to have some online manifestation, whether a full-text scan or a metadata record, will force all of a library’s hidden collections into the light, including manuscripts, images and other special collections.
-
Dissertations, articles, books, working papers, technical reports and other such content will flood into the campus libraries for curation, description and distribution. Technical service staff will find an increasing percentage of their work shifted away from the procurement of external content to the care and distribution of locally created content.
-
The Google Book Project will cause a resurgence in the use of the print collections. As books are rediscovered, there will be a shift of resources toward identifying, preserving and republishing books held uniquely by each library.
“The year 2020 will still find libraries creating, collecting, organizing, delivering and preserving information resources; the fundamental “what” of technical services and library collections will not change,” Susan says. “However, we must be ready for a radical transformation in the ‘how’ and ‘why’ of these activities. I believe the focus will shift from external to internal content, from just-in-case to just-in-time collection development, and from disparate silos of information resources to a mandated expectation that those silos can communicate and interact in ways that meet the expectations of library users.”
Metadata ubiquity
For many years, metadata—in the form of shared,structured standards—was important only to librarians, who sought to make the materials in their library collections findable and discoverable to the public. But today, you can hardly talk about digital libraries, data repositories and Web 2.0 without the mention of metadata.
The acknowledgment that metadata is an essential element in the information infrastructure is rewarding.
“The range of metadata activities over this last decade is both extensive and astonishing,” Jane Greenberg says, “and presents an unprecedented opportunity to share information and knowledge as we move forward.
“It is clear that metadata is ubiquitous,” Jane continues. “Education, the arts, science, industry, government and the many humanistic, scientific and social pursuits that comprise our world have rallied
to develop, implement and adhere to some form of metadata practice.
“What is important is that librarians are the experts in developing information standards, and we have the most sophisticated skills and experience in knowledge representation.”
Those skills are being put to good use not only in the library, but in nearly every discipline and societal sector coming into contact with information.
President’s Report | Connecting libraries: Bavarian State Library, Germany
|