Final Report on Preservation Metadata for Digital
Master Files
This is the final report of a completed RLG project.
May 1998
Background
Digital materials are increasingly important in the
development of research collections. In particular, the preservation
and reformatting community is in the process of incorporating
digitization into its repertoire along with microfilming efforts. A
significant component of creating and managing digital collections is
ensuring that the information essential to their continued use is
preserved in an accessible form. The Working Group on Preservation
Issues of Metadata was constituted in May 1997 as a first step in the
process of addressing this issue. The group was asked to identify the
descriptive data elements that should be associated with digital master
files that have preservation-based intent.
It is a commonplace that metadata serves many purposes,
but to date the main emphasis has been on defining elements essential
for discovery and retrieval. Consequently, the starting place for the
group was to examine two prominent metadata systems that purport to
offer a set of "core" elements necessary for discovery of resources:
the Dublin core elements and the Program for Cooperative Cataloging's
USMARC-based core record standard. The group decided to specify the
elements extra to these core element lists that are important to serve
preservation needs for digital masters. The list of data elements below
is the result of this process.
Simultaneously, another group, the
RLG Working Group On
Preservation and Reformatting Information, was examining the
mechanism for sharing of preservation information through the medium of
the USMARC record. Consequently, the metadata working group also took
care to ensure that its recommendations would be compatible with the
work of this other group.
Scope
Since the concept of metadata takes in a lot of
territory, the Working Group had to begin by defining the constraints
that should govern the scope of its activity:
Technological
constraints
Given the fact that the relevant technologies are in a
state of ongoing and rapid development and that digitization efforts
are still evolving in many respects, the group limited its task as
follows:
—The Working Group concluded that it is
premature to make recommendations concerning the way that preservation
information should be stored. Such information may be included in a
header of a digital file, it may exist in some separate but linked
format, or it may be incorporated in a USMARC cataloging record that
may or may not be linked to a corresponding digital file.
—The Working Group noted that many categories of information
important to preservation needs might be automatically captured at the
point of digitization and supports efforts to define a preservation
standard for the formatting and retention of such information. The
Working Group particularly noted the efforts of the Society of Motion
and Television Engineers (SMPTE) to define a universal preservation
format for videos as an important step in this direction. However, it
is too early for this report to attempt to take such work into account
in the preparation of its recommendation.
Format constraints
The Working Group also limited itself to a consideration
of data
elements that describe digital image files. Doing so allowed the group
to address the most significant need within a timeframe short enough to
be meaningful. Members also agreed that it would be most efficient to
constitute other specialist groups to supplement the list of data
elements, adding elements for other formats (e.g., audio files, moving
images) as the need becomes more pressing.
Functional
constraints
Members of the Working Group noted that information that is not
specifically related to preservation tasks may be of potential interest
to the preservation community-for example, copyright and use
restriction information can be crucial and might appropriately be
recorded at the time that preservation staff are creating the digital
master. Members concluded that since the scope of such information
often exceeds preservation needs, it should more appropriately be dealt
with by other specialist groups. However, data elements that might
serve other purposes as well are included as long as they address a
core preservation information need.
Supporting
recommendations
As a result of the considerations above , the group
endorses the following recommendations:
—Institutions should be encouraged to share
their efforts to apply the element set with the rest of the community.
—The current list of data elements should be
supplemented with elements deemed necessary for other formats (e.g.,
audio files, moving images, etc).
—The RLG PRESERV Advisory Council should
continue to monitor and liaise with the Society of Motion Picture and
Television Engineers (SMPTE) in its efforts to develop a universal
preservation format and to define a comprehensive data dictionary (in
order to ensure that such a data dictionary represents preservation
needs).
—The RLG PRESERV Advisory Council should
monitor and liaise as appropriate with other specialist groups
concerned with delineating metadata elements to serve specific needs
that are also of interest to the preservation community (e.g.,
copyright information).
Preservation
metadata elements
The following list of sixteen elements represents
information that the working group deems crucial to the continued
viability of a digital master file. Institutions may exceed this list
or not, but the Working Group recommends that all the enumerated
elements that are relevant to a specific file be recorded.
Since it is recognized that these elements may be
recorded according to the specifications of any one of a number of
metadata systems, no effort has been made to specify syntax. The list
below, including examples, is meant to provide a semantic framework
only. The format of the examples is intended to be illustrative, not
prescriptive. In order to demonstrate how the list might be used,
possible implementations are included in the attached appendices.
1. Date
DEFINITION: Date file is created
FORMAT: yyyyddmm
2. Transcriber
DEFINITION:
Required: Name of the agency
responsible for transcribing the metadata.
Optional: may include identification
of individual transcribing metadata.
EXAMPLE: Stanford University Libraries. Conservation and Preservation
Dept. ; BLK.
3. Producer
DEFINITION:
Required: agency responsible for the physical creation
of the file. One agency may have caused the file to be created by a
second (possibly commercial) agency. In this case, record the name of
the agency responsible the actual creation of the file, not the
delegating agency.
Optional: May additionally identify
individual primarily responsible for scanning, etc.
EXAMPLE 1 (Research Library with in-house scanning operation; includes
initials of scanner): Stanford University Libraries. Conservation and
Preservation Department; KES
EXAMPLE 2 (Commercial firm to which scanning has been outsourced) Luna
Imaging, Inc., 1315 Innes Place, Venice, CA 90291-3617, USA
4. Capture device
DEFINITION: Indicate make and model of digital camera or
scanner
EXAMPLE: Kronton 3012
5. Capture details
DEFINITION 1 (Capture device is a scanner): Name scanner
software, including version information; give scanner settings, gamma
correction, and other relevant details pertaining to scanning
EXAMPLE: PixelCraft Proimager 8000
DEFINITION 2: (Capture device is a digital camera): Give lens type,
focal length, light source type, & indicate if image is tiled.
EXAMPLE: Nikon 24mm lens; high frequency fluorescent studio camera
lights, Videsence, model Pl330, with Osram 55 watt 3200 degree color
temperature
6. Change history
DEFINITION: A record of modifications made to the file,
and significant versions generated, identifying the person/institution
who made them and the date they were made.
EXAMPLE 1: Original digital master image file migrated from TIFF v.X to
TIFF v.X+1 using YYY software by JWC on 20010206.
EXAMPLE 2: Printing file created from original digital master using YYY
software by JWC on 19990411. Colors bars cropped out, pixel dimensions
retained, image sharpened.
7. Validation key
DEFINITION: A mechanism, usually consisting of a number,
that allows one to verify that an electronically transmitted file is
what it purports to be i.e., the file is what is described in the
metadata. At the simplest level, such a key might consist of the number
of lines in a file (similar to the way that one indicates the number of
pages that are transmitted via fax). Especially prevalent is the use of
a checksum which is an algorithm based on a manipulation the sum of the
bits that make up a file to yield number that serves as a unique
identifier for that file.
EXAMPLES: Standard internet checksum; Roland checksum
8. Encryption
DEFINITION: Technique by which data is scrambled before
transmission in order to insure privacy. Encrypted data must be
unscrambled (decrypted) by the receiver. If a file is encrypted, the
type of encryption should be indicated.
EXAMPLE: RSA Public Key Cryptosystem
9. Watermark
DEFINITION: Indicate whether or not some bits in the
file have been altered in order to create a "digital fingerprint" that
can serve to establish ownership of an image and prevent unauthorized
use.
EXAMPLES: Watermark by Digimarc Professional, Watermark by Invisible
Ink for Images
10. Resolution (e.g. pixel dimensions,
dpi, ppi)
DEFINITION: Traditionally determined by the number of
pixels used to represent the scanned image, expressed as pixel
dimensions, pixels per inch or dots per inch. Current research into the
use of Modulation Transfer Function (MTF - a function of the spatial
wave number) to measure resolution should allow a more objective
numerical value to be assigned as the measurement.
EXAMPLES: 4096 x 6144 pixels; 600 dpi; 320 dpi
11. Compression
DEFINITION: Indicate whether or not the file has been
compressed (i.e. reduced in size), and if it has, identify the level
and method of compression.
EXAMPLES: LZW; JPEG, compression level 10 (Corel Photopaint)
12. Source
DEFINITION: Describe physical characteristics of the
source such as its size, condition, and its place in the chain (e.g.,
original, copy, or copy of a copy). Include information about
modifications made to the source to enable better digitization. For
images of photographs and digitized microforms, include image type
(i.e., positive or negative image).
EXAMPLES: Photocopy; 20 x 25 cm.; Original; waterstained; 18 x 22 cm.
13. Color
DEFINITION: Indicate pixel depth.
EXAMPLES: 1-bit; 8-bit
14. Color management
DEFINITION: Identify system, if any, that is used to
improve consistency of color across capture, display and output of an
image.
EXAMPLES: Photo CD; OptiCal (color management system); Profile/80
(color sync profile maker); Softproof (Photoshop Plugin)
15. Color bar/Gray scale bar
DEFINITION: Indicate presence or absence of either and,
if present, identify the type.
EXAMPLES: Kodak Q13 or Q14 Color Separation Guide and Gray Scale; Kodak
Q60 Color Input Target
16. Control targets
DEFINITION: Include information about targets included
in the scanned file for purposes of quality control, calibration,
verification, etc.
EXAMPLES: AIIM Scanning Test Chart #2; RIT Alphanumeric Resolution Test
Object, RT-1-71; IEEE Std 167A-1995 Standard Facsimile Test Chart
Appendix 1: Dublin
Core implementation
Presented below is an effort to incorporate the metadata
elements enumerated in the body of the report into a Dublin Core record
template. Some data elements have been created as extensions to
currently agreed Dublin Core metadata elements and are tagged as RLG
(for RLG Preservation Metadata) elements rather than DC elements for
illustrative purposes.
This example is not intended to be prescriptive, but to
suggest directions that might be explored further and experimented with
more extensively. There are undoubtedly a number of alternative ways to
embed preservation metadata into Dublin Core records, ranging from
simple links to associated files to more elaborate container
architectures. Shared experiments in this direction and continued
discussion among the members of the preservation community might be
especially fruitful in developing future guidelines.
Hypothetical Dublin
Core record incorporating preservation metadata elements
DC.Title: [Title of digitized item]
DC.Creator.PersonalName: [Author or creator of intellectual content]
DC.Creator.Role: Author
DC.Contributor.CorporateName: [Agency responsible for transcribing
metadata]
DC.Creator.Role: Transcriber (Metadata)
DC.Contributor.CorporateName: [Agency to which digitization was
outsourced]
DC.Contributor.Role: [Producer]
DC.Contributor.CorporateName.Address: [Address of outsourcing agency]
DC.Publisher: [Institution responsible for digitization]
DC.Date: [date digital preservation copy created--YYYY-DD-MM]
DC.Form: Image
RLG.Form.Capture: [Make and model of scanner or
digital camera and relevant capture details]
RLG.Form.Validation: [Validation Key, Watermark]
RLG.Form.Encryption: [Encryption technique]
RLG.Form.Compression.Method [e.g., JPEG, LZW]
RLG.Form.Compression.Level [value including capture device information
that makes this information meaningful]
RLG.Form.Color: [The color palette with which the associated image or
information is rendered]
RLG.Form.ColorManagement: [Associated color management
systems]RLG.Form.Resolution: [e.g., pixel dimensions, dpi, ppi, mtf]
RLG.Form.Modification: [Change History]
DC.Description: [Color Bar/Gray Scale Bar; Control targets]
DC.Identifier: [URL of document if metadata not carried in header]
DC.Source.Date: [Date of print version that is digitally reproduced]
DC.Source.Publisher: [publisher of print version that is digitally
reproduced]
RLG.Source.Condition: [Physical condition of source item, etc.]
Note: Alternatively,
instead of Source use the Relation element to identify print version:
DC.Relation
DC.Relation.Type: IsVersionOf
DC.Relation.Identifier: [e.g., catalog record no. for original]
Appendix 2:
Preservation-related metadata recorded in USMARC records
The templates below offer maps of the 16 Preservation
Metadata Elements (described previously) to a USMARC record. Bracketed
numbers correspond to the list of the 16 recommended data elements.
Please note the following points:
- The examples do not explicitly endorse any of the
several USMARC multiple version cataloging strategies currently under
discussion.
- Note that the examples lack fields that might imply a
particular multiple version implementation, e.g. fixed field values,
linking fields, etc.
- The order, etc. of notes pertaining to the digital
version in the 533 field of the first example and the 538 notes in the
second example are not intended to be prescriptive, merely
illustrative. The order and grouping of elements is intended to suggest
that elements may be combined in one note or given in distinct notes or
groupings as necessary in order to give a complete but parsimonious
description.
- Although creation of records similar to the examples
below would require human cataloging expertise, crude records might be
automatically generated using the mappings of specific elements to
numbered fields proposed below.
Please also note that the RLG Working Group on
Preservation and Reformatting Information, which is explicitly
concerned with the USMARC record, has prepared a discussion paper for
ALA's Machine Readable Bibliographic Information (MARBI) Committee
which would extend the 007 in order to include in coded form much of
the information that must otherwise be included in variable data
fields. That working group is also preparing examples demonstrating a
potential standard configuration of the 533 field that could be used in
conjunction with the extended 007. The adoption of these proposals
would considerably simplify the addition of information corresponding
to the recommended preservation metadata elements.
[For the subsequent outcome of this MARC 007 work, see
Establishing
MARC 21 Coding for Digital Files.]
Template 1:
Description of digital master added to record for hard copy (monograph)
| 040 |
|
NUC$dNUC
[2] |
| 100 |
1 |
Author,
Major. |
| 245 |
12 |
A
very important book /$cby Major Author; edited by Serious Scholar. |
| 250 |
|
4th
ed., rev. |
| 260 |
|
London
:$bProminent Publisher,$c1854. |
| 300 |
|
672
p. :$bill. ;$c28 cm. |
| 500 |
|
Includes
index and bibliographies. |
| 533 |
|
Computer
file. $bBig City: $cBig University Preservation Dept. $d1997.
$f(Scanning Project Series ; 34556)$nChange history [6]. $n795 image
files; Capture device [4] and details [5]; Validation key [7];
Encryption [8]; Watermark [9].$nResolution [10]; compression [11];
color [13]; color management details [14].$nPresence/type of targets
[16], color bar/gray scale bar [15]. |
| 583 |
|
$b1997-10-10
[1]; $lScanned $xImage Outsourcing Co., 1234 Industrial Park St., Big
City, CA [3]; $xCapture device operator [3] |
| 590 |
|
Big
City Univ. copy: Pages 2-4 lacking. [12]. |
| 650 |
0 |
Subject
1 |
| 650 |
0 |
Subject
2 |
| 700 |
10 |
Scholar,
Serious. |
| 830 |
0 |
Scanning
project series ;$v34556. |
| 856 |
41 |
$uhttp://www.abcd.edu/library/dlib/authorm1.tif |
Template 2.
Separate computer file record for digital version
| 040 |
|
NUC$dNUC
[2] |
| 100 |
1 |
Author,
Major. |
| 245 |
12 |
A
very important book $h[computer file] /$cby Major Author; edited by
Serious Scholar. |
| 260 |
|
University
Town, CA :$bBig University Preservation Dept.,$c1997 $e(Big City (1234
Industrial Park St., Big City 94025) [3] :$fImage Outsourcing Co.) [3] |
| 256 |
|
Data
(795 image files) |
| 440 |
0 |
Scanning
project series ;$v34556 |
| 538 |
|
Change
history [6] |
| 538 |
|
Capture
device [4] and details [5]; validation key [7]; encryption [8];
watermark [9]. |
| 538 |
|
Compression
[11]; resolution [10]; color [13]; color management details [14]. |
| 500 |
|
Presence/type
of control target [16], color bar/gray scale bar [15]. |
| 534 |
|
$pDigital
reproduction of: $b4th ed., rev. $cLondon: Prominent Publisher, 1854.
$e672 p. : ill. ; 28 cm. $nBig. Univ. copy: p. 2-4 lacking. [12] |
| 590 |
|
Scanned
1997-10-10. [1] |
| 650 |
0 |
Subject
1. |
| 650 |
0 |
Subject
2. |
| 700 |
10 |
Scholar,
Serious. |
| 830 |
0 |
Scanning
project series ;$v34556. |
| 856 |
41 |
$uhttp://www.abcd.edu/library/dlib/authorm1.tif |
Appendix 3: XML
implementation
The model below shows how the conservation elements
designated in the report might be configured in a simple XML record.
The model record below, would, of course, reflect the specifications of
a DTD which is not reproduced in this report. Note that the model below
does not conform to the RDF specification which would provide another,
significant way to present the requisite conservation data in XML
format.
Model XML record
incorporating preservation metadata elements
‹RLG.SOURCE_TITLE›[Title of item
that is digitized]‹/RLG.TITLE ›
‹RLG.SOURCE_CREATOR ROLE="Author"›
‹RLG.PERSONAL_NAME›[Author/creator
of original item]
‹/RLG.PERSONAL_NAME›
‹/RLG.SOURCE_CREATOR›
‹RLG.SOURCE_PUBLISHER›[Publisher of original item]
‹/RLG.SOURCE_PUBLISHER›
‹RLG.SOURCE_DATE›[Publication date of original
item]‹/RLG.SOURCE_DATE›
‹RLG.SOURCE_CONDITION›Pages 3-5 missing;
waterstained‹/RLG.SOURCE_CONDITION›
‹RLG.DIGITIZED_VERSION URL="[URL for digitized
version]"›
‹RLG.TRANSCRIBER›
‹RLG.TRANSCRIBER_NAME›[Name
of agency that transcribes metadata
‹/RLG.TRANSCRIBER_NAME›
‹RLG.PRODUCER›
‹RLG.PRODUCER_NAME›[agency
that created the digitized version,
e.g. outsource
agency]‹/RLG.PRODUCER_NAME›
‹RLG.PRODUCER_ADDRESS›[address
of agency that created the
digitized
version]‹/RLG.PRODUCER_ADDRESS›
‹/RLG.PRODUCER›
‹RLG.CAPTURE_DEVICE›[Make and model of digital
camera or scanner]
‹/RLG.CAPTURE_DEVICE>
‹RLG.CAPTURE_DETAILS›[Details about scanner
(e.g., software, version information, scanner settings, gamma
corrections, etc.) or digital camera (e.g., lens type, focal length,
light source type, etc.]‹/RLG.CAPTURE_DETAILS›
‹RLG.DATE_DIGITIZED›[yyyy-dd--mm]‹/RLG.DATE_DIGITIZED›
‹RLG.IMAGE_DETAILS›
‹RLG.VALIDATION›[Validation
Key, Watermark, etc.]
‹/RLG.VALIDATION›
‹RLG.ENCRYPTION›[Encryption
Technique]‹/RLG.ENCRYPTION›
‹RLG.COMPRESSION
LEVEL="[Compression level]"
METHOD="[Compression method]"›
‹/RLG.COMPRESSION›
‹RLG.COLOR›[The
color palette with which the associated image or
information is
rendered]‹/RLG.COLOR›
‹RLG.COLOR_MANAGEMENT›[Associated
color management systems]
‹/RLG.COLOR_
MANAGEMENT›
‹RLG.RESOLUTION›[e.g.,
pixel dimensions, dpi, ppi, mtf]
‹/RLG.RESOLUTION›
‹RLG.MODIFICATION›[History
of changes to digital version]
‹/RLG.MODIFICATION›
‹/RLG.IMAGE_DETAILS›
‹RLG.DESCRIPTION›[Color Bar/Gray Scale Bar;
Control targets]
‹/RLG.DESCRIPTION›
‹/RLG.DIGITIZED_VERSION›
|