What is digital preservation?
Stephen Chapman
Harvard University Library
What is digital preservation? I trust that this audience will not be surprised by a librarian taking the prerogative of classifying the question before giving an answer! Preservation administrators generally view digital preservation as a sub-topic of library preservation, which has a single overarching mandate: to facilitate use. [Weissman] In the ideal, preservation policies and services would ensure that materials in all formats would be accessible for use in perpetuity.
If our objective were to preserve the artifact only, regardless of usability, longevity would be measured according to the lifespan of an object stored in a given environment. Preservation strategies would seek to optimize manufacturing and storage conditions and, quite likely, offer little intervention once materials had been placed "Tut-like fashion" in their appropriate chambers.
Three components for usability
Keeping materials accessible for use, however, requires more than preservation of the artifact itself. Models for digital preservation make explicit the three components for usability that have traditionally influenced the models for preserving analog formats in our collections:
- preservation of the material (or its "information content")
-
preservation of the apparatus needed to locate, retrieve and represent the material
-
a knowledgeable observer or community
Each of these components, and their relationships, must be accommodated in preservation strategies. The nature of the "apparatus" needed to represent material, for example, depends upon the knowledge base of the designated user community. (We do not, for example, routinely include dictionaries in microfilm versions of books to help the reader. We assume that she knows the language of the text in hand.)
Defining preservation with mediation
Ansel Adams once observed that his images always included at least two people--the photographer and the person(s) observing the photograph. Custodians of library collections similarly need to accommodate the creator, the user community, and whatever is necessary to mediate communication between the two in their definition of "use" and strategy for preservation. Since each of the three components listed above can vary from collection to collection, or format to format, it is probably more accurate to define preservation as "ensuring that materials remain usable to certain users under certain conditions."
Long term usability requirements
Obsolescence, the nemesis of preservation, occurs in either of the following cases: the material and its associated apparatus become incompatible; or the material/apparatus can no longer meet specified use requirements. Restoring these connections to reestablish usability sometimes requires wholesale translations. To avoid such efforts and expenditures, preservation strategies strive to manage both technological compatibility and user expectations.
The slippery slope to obsolescence for digital materials
As is the case with other modern formats, digital files require a more complex apparatus to be usable: they must be usable to machines and to people. This is the reason that digital preservation models require so much metadata: one set accommodates machines, the other is for people. To summarize, preservation of the material is the main challenge to perpetuate usability for print and photographic formats; preservation of the apparatus--with technology that changes rapidly--is the slippery slope to obsolescence for digital materials.
Definitions
One could probably collect as many definitions of "digital preservation" as there are research papers, project reports, committee charges, and draft guidelines on this topic. For the purposes of this introduction to the topic, two repay scrutiny:
"Digital preservation is the ability to keep digital documents and files available for time periods that can transcend technological advances without concern for alteration or loss of readability."
(The Association for Information and Image Management)
"Digital preservation refers to the series of managed activities necessary to ensure continued access to and preservation of digital materials." (RLG/OCLC Report)
These definitions state the traditional preservation goals of "permanence" and maintenance of "access." Both underscore the obligation to keep digital materials, presumably indefinitely. Similarly, each mandates that digital materials will be continually available and accessible--with the implication, of course, that data will be accessible to machines and information will be accessible to people.
Potential problems with these definitions
Proposing that files will be kept and available is one thing. Suggesting that when the files are delivered that they can be rendered without loss is another. Note that AIIM's definition proposes that files will be preserved "without concern for alteration or loss of readability." The RLG/OCLC statement carefully avoids alluding to integrity, authenticity, appearance or any other attribute of the original file. Determining what elements constitute the meaning of library materials has always been challenging. [CLIR] Important attributes of any material are not self-evident, and they are particularly vexing to identify for formats that are dynamic. Including a phrase "without alteration" begs the question of how long an original bitstream dutifully copied to new media would remain human readable. Alternatively, if transformations were needed to maintain machine compatibility, could these be programmed to ensure there would be no alteration?
A "series of managed activities"
Finally, the definitions present different viewpoints on how digital preservation might be achieved. AIIM refers simply to an "ability" to preserve. RLG/OCLC is more specific. Kelly Russell, the Report author, uses a very nice turn of phrase to encapsulate the complex set of perpetual obligations and services that are beginning to be endorsed as necessary elements of digital preservation. [see, DLF] Digital preservation, she states, is a "series of managed activities. [Russell]" Viewed broadly, digital preservation is like librarianship writ large (and unlike conservation treatment or preservation reformatting). It refers to managed activities that must be administered in perpetuity.
OAIS Reference Model
If digital preservation is a "series of managed activities," then what are they? Is it possible to elaborate upon this part of the definition? The short answer is yes. The longer answer is to cite the name of a draft ISO standard, entitled Reference Model for an Open Archival Information System. [CCSDS] Referred to by its acronym OAIS, this draft standard is the product of ten years of work by NASA's Consultative Committee for Space Data Systems (CCSDS). The library community has been very fortunate to build upon this analytical foundation.
Summary of the OAIS Reference Model
Brian Lavoie of OCLC has written a summary of the OAIS Reference Model that provides a highly readable overview of a long and complex technical document filled with acronyms. It is highly recommended as an introduction to the topic. [Lavoie] As summarized by Mr. Lavoie, OAIS has two key meanings. First, it describes the range of service of a digital archives:
An OAIS is understood to mean any organization or system charged with the task of preserving information over the long term and making it accessible to a specified class of users (known as the designated community).
Second, it presents a model of the components needed to create a system to support the range of preservation services.
OAIS provides reasonable description of both the functional requirements of a digital archiving system, and the information requirements needed to support these components.
In order to provide a "reasonable description" of functional and information requirements, OAIS has provided organizations with the necessary vocabulary to define the components of digital preservation systems and services. This vocabulary has been instrumental in fostering collaborative research projects and communicating systems requirements with vendors selling products and services. [see Projects]
OAIS presents seven "functional entities" of a digital archives. [CCSDS, Fig. 4.1] The full list of entities and their associated activities presented below are presented here as comprising the fully deconstructed definition of "What is digital preservation?"
Seven functional entities
Digital preservation includes:
1. Ingest
-
Receive Submission Information Package (SIP) that conforms to Submission Agreement
-
Conduct quality assurance to validate (completeness of SIP) successful transfer of SIP to staging area
-
Generate Archival Information Package (AIP) from SIP(s)
-
Preservation Description Information (metadata) extracted from the SIP
2. Archival Storage
-
Data receipt: AIPs moved from staging area to permanent storage
-
Media backup (redundant storage)
-
Error checking
-
Management of storage hierarchy
-
Disaster recovery
-
Media replacement (refreshment)
-
Means to provide data to Access
3. Data Management
-
Administer database in accordance with policies from Administration
-
database contents = Preservation Descriptive Information and system information used to support archive's operation
-
Perform queries & generate reports
-
with requests received from Ingest, Access, or Administration
-
Process database updates
4. Administration
-
Negotiate Submission Agreement
-
rights, ownership, scope of content, obligations for management and delivery
-
deciding what is the Content Information must be negotiated
-
Manage day-to-day operations
-
Develop data standards and policies
-
Provide customer service
5. Preservation Planning
-
Develop preservation strategies and standards
-
Monitor technology
-
Monitor designated user communities
-
track changes in service requirements and available product technologies
-
Develop packaging designs and migration plans
6. Access
-
Help consumers identify and obtain descriptions of information in archive
-
(from the AIPs) Generate Dissemination Information Packages (DIPs) in response to user request
-
Deliver information or query result set from archive to consumers
7. Common Services
-
Operating system services
-
Network services
-
Name services
-
Security services
-
System backup (database)
Summary
This cursory listing of the components of the OAIS Reference Model illustrate a wide range of digital preservation services with a complex relationship. Unfortunately, commercial "solutions" for storage and digital asset management do not accommodate the entire OAIS framework. When can digital preservation be implemented, we might ask? When organizations can define functional requirements and build production systems, create agreements and develop policies, educate and train archives' constituencies, and, not least, implement business models for sustainability and growth.
Costs
If one accepts that digital preservation is the series of managed activities as illustrated by the OAIS Reference Model, then the cost of digital preservation is the cost of establishing and maintaining all of these services. Costs will vary according to many factors--such as the level of interactivity a repository will support (dark, dim or bright archives)--but digital preservation will certainly be more expensive per-unit than costs to store comparable analog counterparts. Reducing costs is one of the most important research issues. One strategy likely to take hold will be the use of standard formats for archiving. Common, well-characterized formats will reduce the frequency of migration (or other intervention) and facilitate automated processes to manage digital materials and their associated metadata.
Selected Research and Development Projects
During the past three years or so, a number of research and development projects have been designed, at least in part, to respond to the OAIS Reference Model. Is it complete? How would different communities implement the model? If implemented differently, would these archives be interoperable--or could data at least be distributed out of one and ingested into another? These and many other questions are setting the research agenda.
Not surprisingly, given their institutional missions, national libraries
and large research libraries are initiating projects to test and implement
digital archives--to "do" digital preservation. Sponsorship has been forthcoming from public funds and major foundations. Commercial digital repository services have emerged much more recently. They are so new that it is too early to determine whether they are offering digital asset management solutions--effective for homogenous, low-use materials, but not well tailored to library collections and services--or fully-fledged digital preservation services as modeled by OAIS. (The language of the Submission Agreement between owner and repository, if one is offered, should reveal much about the design of these services.)
Five example projects
The five projects cited below provide examples of architectures to implement the OAIS Reference Model, and data related to many important issues. These include: documentation of the overhead in gathering metadata to ingest and manage digital objects, the viability of emulation as a preservation strategy, and the success and failure rates retrieving data from the archives (repository). Use each as a starting point to gather additional information about what digital preservation is and what is required to build and maintain digital preservation services.
- NEDLIB (Networked European Deposit Library), 1998-2000
-
extensive collaboration: eight national libraries (led by Koninklijke Bibliotheek), one national archive, two IT organizations, and three publishers
-
deliverables: forum for exchange of best practices, demonstrator of a deposit system for electronic publications ("toolbox"), evaluation of feasibility of emulation
- Cedars (CURL Exemplars in Digital Archives), 1998-2001
-
deliverables: development of demonstrator archive distributed across three partner sites
-
six UK test sites (recent ingest test)
-
production of metadata schema with documentation of required specialist knowledge and expertise
-
development of preservation policies
- National Library of Australia Digital Services Project, 1999-
-
deliverables: infrastructure for long-term management of digital material; cost-effective technical solutions for development and delivery of digital services
-
commercial solution identified for metadata repository and search services
-
request for digital collection management system revised to three new procurement and development projects
- San Diego Supercomputer Center, "Persistent Archives and Electronic Records Management" (see also http://www.npaci.edu/Research/DI/Talks/loc.pdf)
-
NARA prototype, archive of 1 million e-mail messages (2.5GB of data)
-
deliverable: testbed system successfully ingested, archived, recreated, queried, and presented digital objects
-
advisory role to California Digital Library
- Mellon Electronic Journal Archiving Program, 2001-2005
-
proposals solicited in year one to plan and develop repositories meeting "minimum criteria" (DLF) based closely on OAIS Reference Model; implementations in years 2-5
-
publisher based archive (Yale, Harvard, Penn)
-
subject based archive (Cornell, NYPL)
-
MIT archiving dynamic e-journals
-
Stanford to develop specific archiving software tools
Portals to Digital Preservation Resources
Digital preservation is a topic that now engages many experts in many disciplines. The following sites are good starting points-for both the general and expert reader-to information resources dealing with digital preservation worldwide. From these gateways, one will find links to announcements, discussions, presentations, projects, research reports, standards, and related information resources of interest to librarians, archivists, and information managers in business and industry (AIIM).
The Association for Information and Image Management (AIIM) Digital Preservation Site
(collaboration among AIIM International, Kodak Document Imaging, and Lockheed Martin)
Digital Library Federation, "Digital Preservation"
Joint Information Systems Committee (JISC), "Digital Preservation"
(includes links to proposals to establish a Digital Preservation Coalition in the UK)
National Library of Australia, "Preserving Access to Digital Information (PADI)"
Moderated Lists
DIGITAL-PRESERVATION (JISC Digital Preservation Announcement and Information List)
(emphasizes activities relevant to preservation and management of digital materials in the UK)
PADIFORUM-L (National Library of Australia)
(to exchange of news and ideas about "all aspects of preserving access to digital information")
Sources
Consultative Committee for Space Data Systems (CCSDS). CCSDS 650.0-R-1. Reference Model for an Open Archival Information System (OAIS). Red Book (Draft Recommendation that has been adopted as ISO DIS 14721). Issue 1.1. 20 April 2001. (available in MS Word and PDF versions)
Council on Library and Information Resources (CLIR). Authenticity in a Digital Environment, May 2000.
Digital Library Federation (DLF). Minimum criteria for an archival repository of digital scholarly journals. Version 1.2. May 15, 2000.
Lavoie, Brian. "Meeting the challenges of digital preservation: The OAIS reference model (PDF, pp. 26)," OCLC Newsletter, no. 243, January/February 2000, 26-30.
Russell, Kelly. RLG/OCLC Report on the Attributes of a Reliable Digital Archive for Research Repositories (PDF). Draft Report. Research Libraries Group and OCLC. 17 April 2001.
Weissman Preservation Center, Harvard University Library. "Principles for Reformatting Library and Archival Collections," February 2001.
|