Skip to page content

América Latina (Português) Alterar

Metadata for Preservation and Access

Robin Dale
Program Officer, Research Libraries Group

Good afternoon. I tend to get this spot in the program. It's generally the unenviable task of speaking either right after a break or a very nice long heavy lunch. So just to keep things interesting for you, I decided to change a few slides. And for those of you who are awake and already at your seats, you'll notice that the first change is title slide!

In reality, some of the changes are to avoid duplicating what my colleagues have already said. For example, I was fairly sure Steve would be covering the OAIS. Realizing how dense this document is from having read the 180-odd pages myself, I thought I should filter out some of that information, and concentrate a bit more on the preservation and access side of things.

So starting off with this first slide, I want to go back to the symposium objects that Meg outlined earlier: awareness, definition of current evolving practices, and cooperation. What I am going to do is shuffle the order and begin by talking about cooperation. And indeed, it's one of the reasons why I'm here today.

Cooperation
In March of 2000, RLG and OCLC had a series of discussions which led to the establishment of several joint activities. Of these joint activities, we decided to collaborate on two activities to identify and support best practices for the long-term retention of digital objects: definitional work on preservation metadata and definitional work on the attributes of a digital repository for research resources. Our organizations have the same relative communities - many institutions belong both to OCLC and RLG - and we both work to provide guidance and best practice setting, so working together was a natural one on this area. In distributing the work and responsibility for these efforts, each organization took the lead on one, organizing a working group of international experts to address the respective topic.

For the first one, preservation metadata, you can probably guess that OCLC took the lead. Having a strong background and established bona fides with metadata, OCLC was the obvious home for this work. For the work to define the attributes of digital repository for research resources, RLG was the natural organization to take the lead. As far back as 1994, RLG was working on digital preservation for libraries and archives, including co-sponsoring the Task Force on Archiving of Digital Information. That task force's seminal work, Preserving Digital Information, helped lay the foundation for what is now the Attributes work. So before going on to the direction of standards and best practices, let me give you just a little bit of idea where these working groups are going and who they involve.

The Working Groups
When we talk about digital archives or digital repositories, we're not only talking about different kinds of collections and different sizes of collections, but we're also talking about an international context. As you can see from this slide, the representation on the two different working groups reflects this internationalism. The Preservation Metadata Working Group includes experts from the NEDLIB Project (the representative is from the Koninklijke Bibliotheek- the National Library of the Netherlands); the Bibliothèque nationale de France; the National Library of Australia, the British Library, the Library of Congress, and finally, the New York Public Library. And for the Attributes Working Group - which I will discuss in depth a bit later - the expert members are from the Bayerische Staatsbibliothek, the Bibliothèque nationale de France, Cornell University, the CURL Exemplars in Digital Archiving (Cedars) Project, the Joint Information Systems Committee (JISC, UK), the National Library of Australia, and the University of Michigan. And to describe their work, let's briefly go back to the OAIS. When RLG and OCLC agreed to collaborate on these two issues, it was agreed that the work should include any existing standards-building activities. For digital preservation, these activities were already focused on the OAIS.

Awareness and evolving best practices
With the OAIS in mind, I am going to move ahead to the "Definition awareness and evolving best practices" part of the presentation. In looking at this slide of the OAIS environment and its basic functional entities, you can see the key role the descriptive information or the metadata holds in long-term retention and access to digital materials. In fact, the OAIS even contains an Information Model that broadly describes the metadata requirements associated with retaining a digital object over the long-term. But as a model, the OAIS does not specify which metadata elements are critical in the preservation process or how to implement such a metadata set. In order to be able to use this emerging standard in digital repositories for research resources, community recommendations and guidance was needed.

To address these preservation metadata needs, the Preservation Metadata Working group was formed. It is the job of the group to develop a metadata framework and a set of recommended data elements that will enable the long-term preservation of digital resources. But what exactly do I mean by preservation metadata?

The Many Flavors of Metadata
I think most people are familiar with or have heard of three, or perhaps four types of metadata. The most commonly discussed are administrative metadata, structural metadata, and descriptive metadata; sometimes you might also hear about technical metadata. In the context of digital preservation, metadata becomes crucial - it is the data that will support meaningful access to the archived items in the digital repository. What do these categories or types of metadata really do? At it's most basic, administrative metadata is used for managing and preserving objects in the repository; structural metadata is used primarily for storage of objects in a repository and for presentation; and descriptive metadata is used to facilitate discovery of objects. Let me take a moment to describe the functions of each in a little more detail and to provide you with examples each.

  • Resource Discovery
    I think everybody in this room is probably aware of Dublin Core. Back in 1995, OCLC helped define and create what is now called the Dublin Core Metadata Initiative. From co-sponsoring the first workshop in Dublin, Ohio, OCLC has helped to lead this international effort to create and adopt interoperable metadata standards and develop specialized metadata vocabularies to enable better resource discovery. And how does Resource Discovery relate to digital preservation? Well, if you think about preservation in terms of preservation and access, they go hand in hand. Why preserve something in a digital repository if no one will ever be able to access it. So the descriptive metadata enables resource discovery and is a key component in any long-term preservation and access scheme.
  • Presentation and Navigation
    The second functional type of metadata would be presentation and navigation. This is the structural metadata that people talk about. Structural metadata is data about a resource that describes its internal structure and serves to organize its delivery. For example, an institution digitizes a book. The resulting images are many and are simply a bunch of page images in your Digital Repository. In order to use the images - to create a useful resource for your users - structural metadata has to be employed. What is the "structure" of the resource? Do users need to be able to click from page to successive page as they would do with a book in hand? Chapter to chapter? And what is the "page turning" mechanism that allows this? This is an example of structural metadata aiding in the presentation and navigation of a digital resource. Structural metadata may also be encoded within a document, for example, a finding aid that has been encoded using EAD or Encoded Archival Description. Those are a few brief examples.
  • Rights Management
    As you just heard Steve Chapman explain, rights management information is extremely important for the long-term management of digital resources. Digital objects, from the simple to the complex, have owner and user issues, as well as an array of events of right and transactions associated with them. Rights management controls protect the intellectual property rights of the people or the organizations who own the material. They are directly related to access control, ensuring that the right people have access to the right information at perhaps the right time. And with access control measures, a certain level of protection is also provided for the digital resources. By controlling access, the digital files can also be protected from purposeful or malicious changes.

    And on this slide, you can see two of the leading rights management initiatives: <indecs> and DOI. The first one, <indecs>, is the Interoperability of Data in E-Commerce Systems. Originally a project, it was established to integrate resources from different sectors, such as copyright societies, database producers, creators and producers of resources, and music publishers. The project created a rights metadata framework to allow metadata developed in different contexts to interoperate effectively and permit automated e-commerce in intellectual property in the network environment. While this was perceived as a project solely for e-commerce rights owners, the project's findings benefit the not-for-profit, library and archival communities as well. For the reasons described above, resources within our collections also need rights management and control and the <indecs> project not only created a metadata set, but made sure it would interoperate with other systems.

    A second, important rights management initiative is the Digital Object Identifier or the DOI Foundation. The Digital Object Identifier (DOI) provides a framework for managing intellectual content. It allows you to designate a unique and persistent identification code for digital objects in a repository, not only controlling usage rights, but also managing the hyperlinks to the associated resources. By using this one type of administrative metadata, rights management and persistence are controlled for that resource.
  • Administration and Preservation
    The last functions I want briefly explain are Administration and Preservation. Metadata serving these needs are often referred to as administrative or technical metadata. Administrative and technical metadata is basically data about a resource that facilitates preservation, collection management, and access management and it is this key area in which the working group is concentrating. But our organizations are not working alone in this area. An effort which is complementary to the OCLC/RLG initiative is the NISO Committee on Technical Metadata for Digital Still Images, and Steve Chapman was one of the authors of a data dictionary that is the main component of a developing national standard. This type of metadata is the "preservation description information" a term from the OAIS Reference Model. It is descriptive information and system information used to support a digital object or the repository's operation. The work of the NISO committee will dovetail nicely with the metadata framework the OCLC/RLG working group has been developing and will allow the joint working group to concentrate on other preservation metadata issues.

    Preservation Metadata does not exist in isolation. In many ways, it can be thought of a union of all different types of metadata that deals with continued use, continued discovery and integrity of the object. So as I move in to talking about the Preservation working group or Metadata working group, please keep that in mind.

Addressing Preservation Metadata
And moving to the strategy issues. A working group of leading experts in - and practitioners of - preservation metadata was convened. These experts are from all different fields, groups or subgroups within the digital preservation community. We wanted people who understood metadata, somebody who understood digital archiving, somebody who understood rights management, and somebody who understood discovery and delivery. We were very fortunate to put together that unique group of individuals. We were also very fortunate in that OCLC put two people in charge of creating a white paper, which would be the basis of our work. Brian Lavoie was the principal architect of this white paper, and Ed O'Neill, an OCLC colleague contributed to the document.

  • The White Paper
    The white paper reviews several existing preservation metadata element sets, comparing and contrasting the elements listed within. Now if several preservation metadata sets already exist, why did we go ahead with our work? Because in general, the existing sets were designed to meet the needs of a specific institution or project, such as the National Library of Australia, the NEDLIB Project, the Cedars Project, etc. In fact, many institutions have designated institutional metadata sets, but these particular sets were of interest because they specifically used the OAIS Reference Model as context for their repository or had referred to the OAIS principles during the design of their repository. The white paper revealed some promising information -- even though the previously mentioned sets had been created for different institutions, the general purpose for the metadata was the same - to preserve digital objects. Therefore, the review identified a great deal of what we termed convergence, that is, similarity in the metadata elements thought to be important to preserve digital objects. Perhaps the consensus building would not be very difficult? At the same time, a bit of divergence in element selection was apparent. Some divergence is attributable to differences in institutional missions and goals and/or the needs of their users, but not all could be attributed to this. The issues surrounding the convergence and divergence of elements meant that the working group would have a lot of work ahead of them before being able to build community consensus on preservation metadata.
  • It's not a recipe, it's guidance
    The goal of the working group is to provide guidance on preservation metadata in such a way that it is helpful to as many people and institutions as possible. At the same time, we recognize that any such recommendation will not be a "cookie cutter" type document or a "recipe" for digital preservation. The different priorities and environments will require flexibility, so what we can do is try to create a document that can help people and guide people. So it's going to be a big guidance document.
  • Implementation issues
    But similar to the NISO data dictionary effort I mentioned earlier, it's not good enough to just give people a framework and a list of elements and say, "Use these." What we also need to provide are guidelines for implementation. How do you use this metadata set? How do I take this stack of paper that's sitting on my desk and use it with my digital objects? How do I implement the recommendations into my system? Or perhaps, how do I convince my systems people to use this? These questions and their respective answers are also being tackled by the working group. The group plans to address several different implementation options - including testing pilot applications, if possible - and make those sample solutions available as an integral part of the group's work.

Progress Update
The Metadata Group is currently tackling the assertions and information within the white paper in order to strengthen it. At present, we are working to come to agreement on types of metadata that roughly correspond to the sections of the OAIS information model. We are just completing work on what the OAIS refers to as Content Information (the digital content being preserved - the digital object - plus the information necessary to render/display, understand, and interpret the content) and beginning to work on what is referred to as Preservation Description Information. Preservation Description Information is the information necessary to preserve the Content Information with which it is associated and generally includes provenance information, unique identifiers, as well as authentication and fixity information.

Progress on these work areas can be tracked by viewing the working group's web site: www.oclc.org/digitalpreservation/wgmetadata.htm. I would urge each of you to visit the web site, if only to get the white paper (Preservation Metadata for Digital Objects: A Review of the State of the Art). Even as a working paper, it is already cited quite frequently. In the coming days, more documents from the working group will be made available through that site, including the group's final report and recommendations. Our goal is complete our recommendations and pilot testing by the end of December and we are on track to meet that goal.

Attributes of a Digital Repository
Let me briefly touch on the second initiative because I believe it is equally important to digital preservation as the metadata work. If we think back to the OAIS Reference Model, you will remember that the OAIS not only provides an information model, but also a functional model for a digital archive. This second working group, the Digital Archive Attributes Working Group, is to address the latter model.

As Steve mentioned earlier, the OAIS is just a generic model. It does not specify an implementation or explain how research libraries and archives could apply the model. And in fact, the OAIS initiative was spearheaded by the space data community, a community with rather homogenous types of data. The needs of the research resources community is very different. We have all types of files. We have image files. We have text files. We have databases and datasets. We have video. We have audio. So this new working group has been charged with developing a rational set of criteria for an archive that can hold the full range of digital collections and datasets (including both "born digital" and "born-again digital" information). In meeting this charge, the working group will define the characteristics of reliable archiving services for heterogeneous research collections.

Questions the report will address include:

  • What are some of the attributes of a trusted digital repository?
  • If my institution is in the planning stages of building a digital repository, what criteria should be included in the process?
  • Or if my institution will not be building an in-house repository but instead will need to contract out the responsibility for the long-term retention of the digital collections, what do I need to look for in a third-party service provider?
  • How can I tell if a third-party service - even an otherwise trusted institution - will be able to care for my digital resources?

The product of this working group will be guidance in these areas. What should our digital repository systems include? What will make them reliable? What do I look for or what can I expect with contracted services? What are the standards in this area?

But again, the group's final report will not be a recipe for digital preservation. The goal here too, is to produce guidance regarding what to look for. And from our experience thus far, the group will also make recommendations for further work in certain related areas.

The working group is on track to accomplish its goals, albeit a little behind schedule. As with the Preservation Metadata Working Group, the work of this group began with a white paper. Kelly Russell, the former Project Manager for the Cedars Project, authored the draft and is a working group member. This draft was reviewed and critiqued by the working group members and invited experts, leading to a major revision. I was truly hoping to be able to hold up a copy of the drsft today, wave it and say, "Look, this is what the working group has completed for you." Unfortunately, the major revision and some newly written sections affected our schedule and the release date. A new draft has been completed and is currently under review by the working group. The next step is a 6-week public review and comment period which should begin in late August. After that, the document will undergo final revision before being released as a final report.

RLG and OCLC are very proud of what has been created because we think the report will go a long way toward addressing the range of needs for guidance. The report tries to address all needs, from those of small institutions to large institutions, from a very small archive to a large national library that must deal with legal deposit. But the only way to ensure community consensus on these issues is to make sure we receive a great deal of input and comments on the current draft during the public review. This is where the invitation comes.

Please, please, please, please. No matter what your experience is, whether you have a digital repository, your institution is building a repository, you're in the thinking stage, or you're just interested, please take the opportunity to read this document. RLG and OCLC welcome your input and hope you will accept the invitation to read and comment upon the draft Attributes document.

Summary
In the last few years, there has been a growing awareness about the potential and the importance of metadata for digital collections. In general, much of this awareness has been focused on metadata for discovery purposes, but as our digital collections grow, so too will the importance of all other types of metadata. Without metadata, we will not be able to reliably store our digital collections for the long-term. Without metadata, we would not be able to access what we are able to store. I hope my brief discussion about the types and functions of metadata - especially preservation metadata - have been helpful.

As well, I would once again like to encourage you to follow the progress - and indeed, participate in the progress - of the joint working groups I described today. Both the progress updates and the contact points for the Working Group on the Attributes of a Digital Repository and the Preservation Metadata Working Group are listed in the symposium handouts.

Thank you.

About the presenter

Robin Dale

Robin Dale has been a Program Officer for Member Initiatives with RLG for the past 4.5 years.

In that position, she leads one of RLG's key initiatives, the Long-term Retention of Digital Research Materials, as well as RLG's PRESERV community, a program which focuses on preserving and improving access to endangered research materials.

Prior to joining RLG, Robin was Head of the Preservation Reformatting Department at Columbia University and worked in the Preservation Replacement Department at the University of California, Berkeley.


Additional resources

Presentations from the Digital Preservation Resources Symposium 2001