Final Report on Preservation Metadata for Digital Master Files

This is the final report of a completed RLG project.

May 1998

Background

Digital materials are increasingly important in the development of research collections. In particular, the preservation and reformatting community is in the process of incorporating digitization into its repertoire along with microfilming efforts. A significant component of creating and managing digital collections is ensuring that the information essential to their continued use is preserved in an accessible form. The Working Group on Preservation Issues of Metadata was constituted in May 1997 as a first step in the process of addressing this issue. The group was asked to identify the descriptive data elements that should be associated with digital master files that have preservation-based intent.

It is a commonplace that metadata serves many purposes, but to date the main emphasis has been on defining elements essential for discovery and retrieval. Consequently, the starting place for the group was to examine two prominent metadata systems that purport to offer a set of "core" elements necessary for discovery of resources: the Dublin core elements and the Program for Cooperative Cataloging's USMARC-based core record standard. The group decided to specify the elements extra to these core element lists that are important to serve preservation needs for digital masters. The list of data elements below is the result of this process.

Simultaneously, another group, the RLG Working Group On Preservation and Reformatting Information, was examining the mechanism for sharing of preservation information through the medium of the USMARC record. Consequently, the metadata working group also took care to ensure that its recommendations would be compatible with the work of this other group.

Scope

Since the concept of metadata takes in a lot of territory, the Working Group had to begin by defining the constraints that should govern the scope of its activity:

Technological constraints

Given the fact that the relevant technologies are in a state of ongoing and rapid development and that digitization efforts are still evolving in many respects, the group limited its task as follows:

—The Working Group concluded that it is premature to make recommendations concerning the way that preservation information should be stored. Such information may be included in a header of a digital file, it may exist in some separate but linked format, or it may be incorporated in a USMARC cataloging record that may or may not be linked to a corresponding digital file.
—The Working Group noted that many categories of information important to preservation needs might be automatically captured at the point of digitization and supports efforts to define a preservation standard for the formatting and retention of such information. The Working Group particularly noted the efforts of the Society of Motion and Television Engineers (SMPTE) to define a universal preservation format for videos as an important step in this direction. However, it is too early for this report to attempt to take such work into account in the preparation of its recommendation.

Format constraints

The Working Group also limited itself to a consideration of data elements that describe digital image files. Doing so allowed the group to address the most significant need within a timeframe short enough to be meaningful. Members also agreed that it would be most efficient to constitute other specialist groups to supplement the list of data elements, adding elements for other formats (e.g., audio files, moving images) as the need becomes more pressing.

Functional constraints

Members of the Working Group noted that information that is not specifically related to preservation tasks may be of potential interest to the preservation community-for example, copyright and use restriction information can be crucial and might appropriately be recorded at the time that preservation staff are creating the digital master. Members concluded that since the scope of such information often exceeds preservation needs, it should more appropriately be dealt with by other specialist groups. However, data elements that might serve other purposes as well are included as long as they address a core preservation information need.

Supporting recommendations

As a result of the considerations above , the group endorses the following recommendations:

—Institutions should be encouraged to share their efforts to apply the element set with the rest of the community.

—The current list of data elements should be supplemented with elements deemed necessary for other formats (e.g., audio files, moving images, etc).

—The RLG PRESERV Advisory Council should continue to monitor and liaise with the Society of Motion Picture and Television Engineers (SMPTE) in its efforts to develop a universal preservation format and to define a comprehensive data dictionary (in order to ensure that such a data dictionary represents preservation needs).

—The RLG PRESERV Advisory Council should monitor and liaise as appropriate with other specialist groups concerned with delineating metadata elements to serve specific needs that are also of interest to the preservation community (e.g., copyright information).

Preservation metadata elements

The following list of sixteen elements represents information that the working group deems crucial to the continued viability of a digital master file. Institutions may exceed this list or not, but the Working Group recommends that all the enumerated elements that are relevant to a specific file be recorded.

Since it is recognized that these elements may be recorded according to the specifications of any one of a number of metadata systems, no effort has been made to specify syntax. The list below, including examples, is meant to provide a semantic framework only. The format of the examples is intended to be illustrative, not prescriptive. In order to demonstrate how the list might be used, possible implementations are included in the attached appendices.

1. Date

DEFINITION: Date file is created
FORMAT: yyyyddmm

2. Transcriber

DEFINITION:
Required: Name of the agency responsible for transcribing the metadata.
Optional: may include identification of individual transcribing metadata.
EXAMPLE: Stanford University Libraries. Conservation and Preservation Dept. ; BLK.

3. Producer

DEFINITION:
Required
: agency responsible for the physical creation of the file. One agency may have caused the file to be created by a second (possibly commercial) agency. In this case, record the name of the agency responsible the actual creation of the file, not the delegating agency.
Optional: May additionally identify individual primarily responsible for scanning, etc.
EXAMPLE 1 (Research Library with in-house scanning operation; includes initials of scanner): Stanford University Libraries. Conservation and Preservation Department; KES
EXAMPLE 2 (Commercial firm to which scanning has been outsourced) Luna Imaging, Inc., 1315 Innes Place, Venice, CA 90291-3617, USA

4. Capture device

DEFINITION: Indicate make and model of digital camera or scanner
EXAMPLE: Kronton 3012

5. Capture details

DEFINITION 1 (Capture device is a scanner): Name scanner software, including version information; give scanner settings, gamma correction, and other relevant details pertaining to scanning
EXAMPLE: PixelCraft Proimager 8000

DEFINITION 2: (Capture device is a digital camera): Give lens type, focal length, light source type, & indicate if image is tiled.
EXAMPLE: Nikon 24mm lens; high frequency fluorescent studio camera lights, Videsence, model Pl330, with Osram 55 watt 3200 degree color temperature

6. Change history

DEFINITION: A record of modifications made to the file, and significant versions generated, identifying the person/institution who made them and the date they were made.
EXAMPLE 1: Original digital master image file migrated from TIFF v.X to TIFF v.X+1 using YYY software by JWC on 20010206.
EXAMPLE 2: Printing file created from original digital master using YYY software by JWC on 19990411. Colors bars cropped out, pixel dimensions retained, image sharpened.

7. Validation key

DEFINITION: A mechanism, usually consisting of a number, that allows one to verify that an electronically transmitted file is what it purports to be i.e., the file is what is described in the metadata. At the simplest level, such a key might consist of the number of lines in a file (similar to the way that one indicates the number of pages that are transmitted via fax). Especially prevalent is the use of a checksum which is an algorithm based on a manipulation the sum of the bits that make up a file to yield number that serves as a unique identifier for that file.
EXAMPLES: Standard internet checksum; Roland checksum

8. Encryption

DEFINITION: Technique by which data is scrambled before transmission in order to insure privacy. Encrypted data must be unscrambled (decrypted) by the receiver. If a file is encrypted, the type of encryption should be indicated.
EXAMPLE: RSA Public Key Cryptosystem

9. Watermark

DEFINITION: Indicate whether or not some bits in the file have been altered in order to create a "digital fingerprint" that can serve to establish ownership of an image and prevent unauthorized use.
EXAMPLES: Watermark by Digimarc Professional, Watermark by Invisible Ink for Images

10. Resolution (e.g. pixel dimensions, dpi, ppi)

DEFINITION: Traditionally determined by the number of pixels used to represent the scanned image, expressed as pixel dimensions, pixels per inch or dots per inch. Current research into the use of Modulation Transfer Function (MTF - a function of the spatial wave number) to measure resolution should allow a more objective numerical value to be assigned as the measurement.
EXAMPLES: 4096 x 6144 pixels; 600 dpi; 320 dpi

11. Compression

DEFINITION: Indicate whether or not the file has been compressed (i.e. reduced in size), and if it has, identify the level and method of compression.
EXAMPLES: LZW; JPEG, compression level 10 (Corel Photopaint)

12. Source

DEFINITION: Describe physical characteristics of the source such as its size, condition, and its place in the chain (e.g., original, copy, or copy of a copy). Include information about modifications made to the source to enable better digitization. For images of photographs and digitized microforms, include image type (i.e., positive or negative image).
EXAMPLES: Photocopy; 20 x 25 cm.; Original; waterstained; 18 x 22 cm.

13. Color

DEFINITION: Indicate pixel depth.
EXAMPLES: 1-bit; 8-bit

14. Color management

DEFINITION: Identify system, if any, that is used to improve consistency of color across capture, display and output of an image.
EXAMPLES: Photo CD; OptiCal (color management system); Profile/80 (color sync profile maker); Softproof (Photoshop Plugin)

15. Color bar/Gray scale bar

DEFINITION: Indicate presence or absence of either and, if present, identify the type.
EXAMPLES: Kodak Q13 or Q14 Color Separation Guide and Gray Scale; Kodak Q60 Color Input Target

16. Control targets

DEFINITION: Include information about targets included in the scanned file for purposes of quality control, calibration, verification, etc.
EXAMPLES: AIIM Scanning Test Chart #2; RIT Alphanumeric Resolution Test Object, RT-1-71; IEEE Std 167A-1995 Standard Facsimile Test Chart

Appendix 1: Dublin Core implementation

Presented below is an effort to incorporate the metadata elements enumerated in the body of the report into a Dublin Core record template. Some data elements have been created as extensions to currently agreed Dublin Core metadata elements and are tagged as RLG (for RLG Preservation Metadata) elements rather than DC elements for illustrative purposes.

This example is not intended to be prescriptive, but to suggest directions that might be explored further and experimented with more extensively. There are undoubtedly a number of alternative ways to embed preservation metadata into Dublin Core records, ranging from simple links to associated files to more elaborate container architectures. Shared experiments in this direction and continued discussion among the members of the preservation community might be especially fruitful in developing future guidelines.

Hypothetical Dublin Core record incorporating preservation metadata elements

DC.Title: [Title of digitized item]
DC.Creator.PersonalName: [Author or creator of intellectual content]
DC.Creator.Role: Author
DC.Contributor.CorporateName: [Agency responsible for transcribing metadata]
DC.Creator.Role: Transcriber (Metadata)
DC.Contributor.CorporateName: [Agency to which digitization was outsourced]
DC.Contributor.Role: [Producer]
DC.Contributor.CorporateName.Address: [Address of outsourcing agency]
DC.Publisher: [Institution responsible for digitization]
DC.Date: [date digital preservation copy created--YYYY-DD-MM]
DC.Form: Image

RLG.Form.Capture: [Make and model of scanner or digital camera and relevant capture details]
RLG.Form.Validation: [Validation Key, Watermark]
RLG.Form.Encryption: [Encryption technique]
RLG.Form.Compression.Method [e.g., JPEG, LZW]
RLG.Form.Compression.Level [value including capture device information that makes this information meaningful]
RLG.Form.Color: [The color palette with which the associated image or information is rendered]
RLG.Form.ColorManagement: [Associated color management systems]RLG.Form.Resolution: [e.g., pixel dimensions, dpi, ppi, mtf]
RLG.Form.Modification: [Change History]

DC.Description: [Color Bar/Gray Scale Bar; Control targets]
DC.Identifier: [URL of document if metadata not carried in header]
DC.Source.Date: [Date of print version that is digitally reproduced]
DC.Source.Publisher: [publisher of print version that is digitally reproduced]
RLG.Source.Condition: [Physical condition of source item, etc.]

Note: Alternatively, instead of Source use the Relation element to identify print version:

DC.Relation
DC.Relation.Type: IsVersionOf
DC.Relation.Identifier: [e.g., catalog record no. for original]

Appendix 2: Preservation-related metadata recorded in USMARC records

The templates below offer maps of the 16 Preservation Metadata Elements (described previously) to a USMARC record. Bracketed numbers correspond to the list of the 16 recommended data elements.

Please note the following points:

  • The examples do not explicitly endorse any of the several USMARC multiple version cataloging strategies currently under discussion.
  • Note that the examples lack fields that might imply a particular multiple version implementation, e.g. fixed field values, linking fields, etc.
  • The order, etc. of notes pertaining to the digital version in the 533 field of the first example and the 538 notes in the second example are not intended to be prescriptive, merely illustrative. The order and grouping of elements is intended to suggest that elements may be combined in one note or given in distinct notes or groupings as necessary in order to give a complete but parsimonious description.
  • Although creation of records similar to the examples below would require human cataloging expertise, crude records might be automatically generated using the mappings of specific elements to numbered fields proposed below.

Please also note that the RLG Working Group on Preservation and Reformatting Information, which is explicitly concerned with the USMARC record, has prepared a discussion paper for ALA's Machine Readable Bibliographic Information (MARBI) Committee which would extend the 007 in order to include in coded form much of the information that must otherwise be included in variable data fields. That working group is also preparing examples demonstrating a potential standard configuration of the 533 field that could be used in conjunction with the extended 007. The adoption of these proposals would considerably simplify the addition of information corresponding to the recommended preservation metadata elements.

[For the subsequent outcome of this MARC 007 work, see Establishing MARC 21 Coding for Digital Files.]

Template 1: Description of digital master added to record for hard copy (monograph)

040 NUC$dNUC [2]
100 1 Author, Major.
245 12 A very important book /$cby Major Author; edited by Serious Scholar.
250 4th ed., rev.
260 London :$bProminent Publisher,$c1854.
300 672 p. :$bill. ;$c28 cm.
500 Includes index and bibliographies.
533 Computer file. $bBig City: $cBig University Preservation Dept. $d1997. $f(Scanning Project Series ; 34556)$nChange history [6]. $n795 image files; Capture device [4] and details [5]; Validation key [7]; Encryption [8]; Watermark [9].$nResolution [10]; compression [11]; color [13]; color management details [14].$nPresence/type of targets [16], color bar/gray scale bar [15].
583 $b1997-10-10 [1]; $lScanned $xImage Outsourcing Co., 1234 Industrial Park St., Big City, CA [3]; $xCapture device operator [3]
590 Big City Univ. copy: Pages 2-4 lacking. [12].
650 0 Subject 1
650 0 Subject 2
700 10 Scholar, Serious.
830 0 Scanning project series ;$v34556.
856 41 $uhttp://www.abcd.edu/library/dlib/authorm1.tif

Template 2. Separate computer file record for digital version

040 NUC$dNUC [2]
100 1 Author, Major.
245 12 A very important book $h[computer file] /$cby Major Author; edited by Serious Scholar.
260 University Town, CA :$bBig University Preservation Dept.,$c1997 $e(Big City (1234 Industrial Park St., Big City 94025) [3] :$fImage Outsourcing Co.) [3]
256 Data (795 image files)
440 0 Scanning project series ;$v34556
538 Change history [6]
538 Capture device [4] and details [5]; validation key [7]; encryption [8]; watermark [9].
538 Compression [11]; resolution [10]; color [13]; color management details [14].
500 Presence/type of control target [16], color bar/gray scale bar [15].
534 $pDigital reproduction of: $b4th ed., rev. $cLondon: Prominent Publisher, 1854. $e672 p. : ill. ; 28 cm. $nBig. Univ. copy: p. 2-4 lacking. [12]
590 Scanned 1997-10-10. [1]
650 0 Subject 1.
650 0 Subject 2.
700 10 Scholar, Serious.
830 0 Scanning project series ;$v34556.
856 41 $uhttp://www.abcd.edu/library/dlib/authorm1.tif


Appendix 3: XML implementation

The model below shows how the conservation elements designated in the report might be configured in a simple XML record. The model record below, would, of course, reflect the specifications of a DTD which is not reproduced in this report. Note that the model below does not conform to the RDF specification which would provide another, significant way to present the requisite conservation data in XML format.

Model XML record incorporating preservation metadata elements

‹RLG.SOURCE_TITLE›[Title of item that is digitized]‹/RLG.TITLE ›
‹RLG.SOURCE_CREATOR ROLE="Author"›
     ‹RLG.PERSONAL_NAME›[Author/creator of original item]
‹/RLG.PERSONAL_NAME›
‹/RLG.SOURCE_CREATOR›
‹RLG.SOURCE_PUBLISHER›[Publisher of original item]
‹/RLG.SOURCE_PUBLISHER›
‹RLG.SOURCE_DATE›[Publication date of original item]‹/RLG.SOURCE_DATE›
‹RLG.SOURCE_CONDITION›Pages 3-5 missing; waterstained‹/RLG.SOURCE_CONDITION›
‹RLG.DIGITIZED_VERSION URL="[URL for digitized version]"›

‹RLG.TRANSCRIBER›
   ‹RLG.TRANSCRIBER_NAME›[Name of agency that transcribes metadata
   ‹/RLG.TRANSCRIBER_NAME›
‹RLG.PRODUCER› 
   ‹RLG.PRODUCER_NAME›[agency that created the digitized version, 
   e.g. outsource agency]‹/RLG.PRODUCER_NAME›
   ‹RLG.PRODUCER_ADDRESS›[address of agency that created the 
   digitized version]‹/RLG.PRODUCER_ADDRESS›
‹/RLG.PRODUCER›
‹RLG.CAPTURE_DEVICE›[Make and model of digital camera or scanner]
‹/RLG.CAPTURE_DEVICE>
‹RLG.CAPTURE_DETAILS›[Details about scanner (e.g., software, version information, scanner settings, gamma corrections, etc.) or digital camera (e.g., lens type, focal length, light source type, etc.]‹/RLG.CAPTURE_DETAILS›
‹RLG.DATE_DIGITIZED›[yyyy-dd--mm]‹/RLG.DATE_DIGITIZED›
‹RLG.IMAGE_DETAILS›
   ‹RLG.VALIDATION›[Validation Key, Watermark, etc.]
   ‹/RLG.VALIDATION›
   ‹RLG.ENCRYPTION›[Encryption Technique]‹/RLG.ENCRYPTION›
   ‹RLG.COMPRESSION LEVEL="[Compression level]" 
   METHOD="[Compression method]"›
   ‹/RLG.COMPRESSION›
   ‹RLG.COLOR›[The color palette with which the associated image or 
   information is rendered]‹/RLG.COLOR›
   ‹RLG.COLOR_MANAGEMENT›[Associated color management systems]
   ‹/RLG.COLOR_ MANAGEMENT›
   ‹RLG.RESOLUTION›[e.g., pixel dimensions, dpi, ppi, mtf]
   ‹/RLG.RESOLUTION›
   ‹RLG.MODIFICATION›[History of changes to digital version]
   ‹/RLG.MODIFICATION›
‹/RLG.IMAGE_DETAILS›
‹RLG.DESCRIPTION›[Color Bar/Gray Scale Bar; Control targets]
‹/RLG.DESCRIPTION›

‹/RLG.DIGITIZED_VERSION›



We are a worldwide library cooperative, owned, governed and sustained by members since 1967. Our public purpose is a statement of commitment to each other—that we will work together to improve access to the information held in libraries around the globe, and find ways to reduce costs for libraries through collaboration.