Preparing Digital Surrogates for RLG Cultural Materials

This activity is now closed. The information on this page is provided for historical purposes only.

Recommendations for digitizing
Recommended background
Digital conversion service bureaus
Suggestions for exisiting surrogates

Recommendations for digitizing

These are general recommendations , not absolute requirements. Since each digitization project is unique, there may be very good reasons for using alternative quality guidelines or for choosing a different approach (although we strongly recommend that you be consistent within a project). For RLG-funded digitization, please discuss intended variations from these recommendations with Ricky.Erway@oclc.org. For other projects, consider this guidance in the context of your organizational requirements and proceed accordingly.

Images

  • Avoid device-specific color space, format, headers, etc.
  • Size and save page images at 1:1 scale to the dimensions f the original pages.
  • For optimal sharpness, view images on the monitor at 100 percent (i.e., each pixel on the screen representing each captured pixel of the image). Evaluate an area of the image that depicts details and edges.
  • Be sure the whole image area (with edges) has been scanned and no part of it has been cropped.
  • Scan the image in the correct orientation or correct the image orientation in postprocessing.
  • Avoid skew by placing the originals squarely on the scanner.  Rescan a skewed image rather than rotating it after scanning.
  • Check for artifacts such as dropout lines or pixels, banding, lack of uniformity, poor color registration, aliasing, flaring, and contouring.

Textual materials

  • You can create just a digital image, or a digital image and a machine-readable text.
  • Determine in advance if blank pages will be scanned.
  • Create images that meet or exceed these characteristics:
Printed texts and/or line drawings 600 dpi, 1-bit
Grayscale, half-tone, and other black-and-white illustrations 300 dpi, 8-bit
Color illustrated texts 300 dpi, 24-bit
Rare/early printed texts 300 dpi, 8- or 24-bit
  • Use Intel TIFF v 5.0 or 6.0 uncompressed or with lossless compression (ITU Group 4 for 1-bit or LZW for 8- or 24-bit).
  • RGB or PhotoYCC are recommended as acceptable color spaces for digital masters.
  • For machine-readable text, key in or use OCR text in ASCII, UTF-8, or Unicode, preferably corrected to at least 99.995% accuracy, and encoded (e.g., as specified in TEI Text Encoding in Libraries: Guidelines for Best Encoding Practices Version 1.0 (Digital Library Federation, July 1999).

Pictorial materials

Use these resolutions whether scanning from originals or intermediates:

Black-and-white photos 400 dpi, 8-bit
Color photos 400 dpi, 24-bit
Slides or small negatives Effective resolution of 400 dpi, 8- or 24-bit


  • Use Intel TIFF v 5.0 or 6.0 uncompressed or with lossless compression (LZW).
  • RGB or PhotoYCC are recommended as acceptable color spaces for digitalmasters.

Audio

Where these recommendations offer a choice, make your decision based on the nature of the original. For example, spoken-word conversion requirements are sometimes lower than those for other recorded sound such as music. However, an old music recording may not merit such high-quality capture as an excellent spoken-word recording.

Master file:
96 or 48 kHz; 24 bits

Bitstream: Uncompressed PCM

Configuration: Monophonic or stereo depending upon characteristics of source item

Sampling frequency: 96 or 48 kHz depending upon characteristics of source item, 24-bit word length (in some cases, 44.1 kHz/16 bit suffices)

File format: WAVE

Enhancement: none or as determined by contributor

Service file:
MP3 (aka MPEG-1/2 layer 3 audio)

Bitstream: MP3

Quality: Data rate of 192 or 128 kilobits/second, as determined by contributor



Motion

Master file Component digital video bitstream (4:2:2 sampling rate) uncompressed. Note: the data rate for 4:2:2 is 270 Mbits/sec.
Service file Compressed MPEG-2 files at pixel dimensions and data rates determined by contributor, possibly  from a low of 1.2 Mbits/sec to a high of 15 Mbits/sec.

Complex digital objects

  • When digitizing component parts of an object, take care to maintain their relationships. For example, when capturing an album, consider, What is the relationship of the parts to the whole? Should each page be captured separately or should two pages be captured at once? Do the album pages have intrinsic significance, or is it sufficient to capture the images from each page? Is there a relationship between the spreads that should be maintained, or is an indication of sequence enough to recreate the experience of looking through the album?
  • Provide structural metadata for complex digital objects to allow for navigation within the object. Preferably, use the Metadata Encoding Transmission Standard (METS). If you do not use METS, include a link in the record to the text file (if there is one), and a start image and end image.

File naming

  • Use file-naming schemes that are compatible across platforms and systems. Minimize the length of the name. Use only lower case characters a-z, numerical digits, and the following special characters: . _ - (period, underscore, and hyphen). Do not use spaces or any other special characters.
  • Prefer a numbering scheme that reflects numbers already used in an existing cataloging system; if scanning precedes cataloging, use serial file names that will be incorporated into the catalog record.
  • When developing a file-naming scheme, have a good understanding of the whole project. How many images will be scanned? Will they be stored in different directories?  Are the files part of larger complex objects?
  • Use standard file extensions (e.g., tif, .wav, .mp3, mpg, .rm, .txt., .sgm, .xml) in lower case only. 
  • Make sure the file references in your descriptive records match the file names (the extension may be omitted only if it is the same for every image.) The case of file references in the descriptive records must match the case of the actual file names.
  • Replicate the directory structure as referenced in the descriptive records.
  • Don't overload directories with too many files.
Naming a collection of thousands of simple objects (e.g., a photographic collection)
  • Subdivide them in a meaningful way (by series or group) or by chunk (same prefix or in groups of a thousand).
  • Use the reproduction, accession, or a serial number as the stem of the file name.
  • Add a code for special features:
       b for back, if scanning information on the back of a print
       d for a detail of a larger image
Naming a complex object such as a book
  • Create a directory for each object, using an identifying string for the object as the directory name.
  • If there is a text file for the whole object (e.g., an SGML file), use the same string in its file name.
  • For page image file names, use a sequential image number followed by the printed page number (when present), both with leading zeros, to fit the pattern "cccpppf", where:
    • "ccc" is the image control number. These first three digits are used to assign a set of sequential numbers to all of the images for the book. The first image from the book is assigned control number 001; it reproduces the book cover. Control number 002 might be the illustrated end paper, 003 might be a title page, etc. depending on the book. If a document-start target is provided, scan it and give it the file name stem, 000000. If missing pages are encountered, scan a "missing page" target and assign the relevant control number.
    • "ppp" is the printed page number. These next three digits carry the actual printed page number with leading zeros. If the number is Roman, provide the Arabic translation. If there is no printed page number, use 000.
  • Assign a code for special features:
       g — Title Page (if the work has more than one, indicate the main title page)
       n — Table of Contents (if more than one page, indicate all pages)
       l — List of Illustrations (if more than one page, indicate all pages)
       f — Illustration (not a page image including an illustration, but an additional image cropped to include only the illustration)
       x — Index (if more than one page, indicate all pages)
       y — Missing page or other irregularity target

    Example: a book with the ID "mas 014" would be in a directory named mas014; it might contain these files:

    mas014/mas014.sgm
    mas014/000000.tif (target)
    mas014/001000c.tif (cover)
    mas014/002000.tif
    mas014/003000.tif
    mas014/003000f.tif (illus)
    mas014/004000g.tif (title page)
    mas014/005000.tif
    mas014/006000n.tif (contents)
    mas014/007000n.tif (contents cont.)
    mas014/008003.tif (first numbered page)
    etc.

Naming a manuscript collection
  • Create a directory for the collection, and subdirectories for each series, box, and/or folder.
  • If there is a text file for the whole collection (or for each series, box, or folder) use the same string in its file name and place it in that directory.
  • The page image file names will consist of a sequential image number with leading zeros. Since folders generally contain fewer than a thousand pages, you can use a three-digit number (including leading zeroes) for page-image naming. If a document-start target is provided, scan it and give it the file name stem, 000.
  • Assign a code for special features:
       b for back side of a page
       s for start of a new document—since documents and pages are not equivalent, indicate when a new document (report, letter, etc.) begins by adding an s at the end of the file name for each image that represents the start of a new document 

    Example: a manuscript collection with the collection identifier stw, would be in the directory stw, with the following subdirectories and files:

    stw/corresp/81/23/23.sgm
    stw/corresp/81/23/001s.tif (first page)
    stw/corresp/81/23/002.tif
    stw/corresp/81/23/003.tif
    stw/corresp/81/23/004s.tif (start of new document)
    stw/corresp/81/23/005.tif
    etc.
    stw/corresp/81/24/001s.tif
    etc.
    stw/reports/01/01/001s.tif
    etc.

Submission

Choose one of these methods:

  • On media: ISO 9660 CDs or TAR on DLT.
  • For RLG to pick up via FTP: provide access to the directory structure as referenced in the records.
  • By FTP to RLG: copy the file directory structure referenced in records

Sources

Recommended background for digitizing decisions

Selection

Dan Hazen, Jeffrey Horrell, and Jan Merrill-Oldham, Selecting Research Collections for Digitization Council on Library and Information Resources, August 1998. ( decision matrix)

Selecting Library and Archive Collections for Digital Reformatting . Proceedings from an RLG Symposium Held November 5-6, 1995 in Washington, DC.

Outsourcing

RLG Guidelines for Creating a Request for Proposal for Digital Imaging Services (pdf) RLG, 1997 (May 1998).

RLG Model Request for Information for Digital Imaging Services (pdf) RLG, 1997.

RLG Model Request for Proposal for Digital Imaging Services (pdf) RLG, 1997.

Cost estimating

RLG Worksheet for Estimating Digital Reformatting Costs (pdf) RLG, 1997 (May 1998).

Imaging

Anne R. Kenney and Oya Y. Rieger, Moving Theory Into Practice; Digital Imaging for Libraries and Archives RLG, 2000 (see RLG Programs Books and Reports).

Guides to Quality in Visual Resource Imaging Digital Library Federation (DLF) and RLG, 2000.

Steven Puglia, "The Costs of Digital Imaging Projects", RLG DigiNews vol. 3, no. 5 (October 15, 1999).

Imaging halftones: Anne R. Kenney and Louise Sharpe II, "Illustrated Book Study: Digital Conversion Requirements of Printed Illustrations", The Library of Congress Preservation (July, 1999).

Imaging from microfilm: Louis H. Sharpe II, et al., Library of Congress Manuscript Digitization Demonstration Project Final Report October 1998.

Selection, preparation, capture, metadata, archiving: Joint RLG and NPO Preservation Conference: Guidelines for Digital Imaging , September 1998.

RLG Working Group on Preservation Issues of Metadata, Final Report RLG, May 1998.

Franziska Frey, Digital Imaging for Photographic Collections: Foundations for Technical Standards", RLG DigiNews, vol. 1 no. 3 (December 15, 1997).

Howard Besser and Jennifer Trant, An Introduction to Imaging, Getty Information Institute, 1995.

Text

TEI: The TEI Guidelines TEI, 2001.

TEI Text Encoding in Libraries: Guidelines for Best Encoding Practices Version 1.0 Digital Library Federation, July 1999.

Alan Morrison, Michael Popham, and Karen Wilkander, Creating and Documenting Electronic Texts: A Guide to Good Practice AHDS Guides to Good Practice, 1998.

Audio

Bruce Fries with Marty Fries, The MP3 and Internet Audio Handbook TeamCom Books, 2000: Chapter 11, "A Digital Audio Primer" and Chapter 12, "Digital Audio Formats"

Motion

Dave Anderson, The PC Technology Guide: Digital Video (2002).

Digital conversion service bureaus

RLG did not endorse these service providers, but received positive reports from those who had used them.

Apex CoVantage ePublishing Solutions
120 Presidents Plaza
198 Van Buren Street
Herndon, VA 20170
Phone: 703-709-3000
Fax: 703.709.0333
E-mail: info@apexcovantage.com
Contacts: Margaret Boryczka or Tom O'Brien
text conversion, SGML markup, EAD

Backstage Library Works
1180 South 800 East
Orem, Utah 84097
Phone: 800-316-2759
Fax: 801.356.8220
E-mail: jmoore@bslw.com 
Contact: Jodi Moore, Marketing Manager
on-site/off-site scanning; text/prints/transparencies/realia; oversize; bound; data conversion; metadata processing; OCR

Bar-Hama Blumenthal Digital Photography
450 Park avenue
Suite 2702
New York, NY 10022
Tel:  212-400-3281
Fax: 212.400.3293
E-mail: ardon@barhama.com
Contacts: Ardon Bar-Hama or George Blumenthal
on-site, high resolution digital photography of rare books & manuscripts

Boston Photo Imaging

20 Newbury Street
Boston, MA 02116
Phone: 617-267-4086
Fax: 617.267.8711
Contact: David Sempberger
photo scanning

DCL
Data Conversion Laboratory, Inc.
61-18 190th St., 2nd Floor
Fresh Meadows, NY 11365
Phone: 718-357-8700
Fax: 718.357.8776
Contact: Shavy Schwimmer, convert@dclab.com
scanning, OCR and text entry, SGML

Direct Data Capture Ltd (UK and NY)
73 B Ormskirk Business Park
New Court Way
Ormskirk, Lancashire
L39 2YT, UK
Phone:  01695 570707
E-mail: brett@ddcltd.co.uk
bound volume/microfilm scanning, text conversion

Higher Education Digitisation Service
University of Hertfordshire
College Lane
Hatfield, Hertfordshire
AL10 9AB UK
Phone: +44 1707 286078
E-mail: heds@herts.ac.uk
digitization of all manner of originals

Innodata
Innodata Content Services
Three University Plaza
Hackensack, New Jersey 07601
Phone: 201-488-1200
Fax: 201.488.9099
Contact: Joan Meyer, joan_meyer@inod.com, or Steven Keyes, steven_keyes@inod.com, or Jan Palmen
data aggregation and conversion, XML transformation, OCR, and image scanning

Input Solutions, Inc (ISI)
Gaithersburg, MD
Phone: 301-948-6620
Contact: John Solomon
scanning and conversion, microfilm, oversize, text, SGML

JJT, Inc.
Corporate Headquarters, R&D & Production Center
26 Howland St.
Plymouth, MA 02360
Phone: 508-747-9889
Fax: 508-747-9289
Email: info@jjt.com

JJT, Inc.
New York Production Center
231 W. 29th Street
Suite 701
New York, NY 10001
Phone: 212-594-5106
Email: atroncale@jjt.com
Contact: Anthony Troncale
high-quality digital reproductions of pictorial works, including line and photographic images and manuscripts; specializing in conversion of large collections

Kirtas Technologies, Inc.
7620 Omnitech Place
Victor, New York 14564-9782
Phone: (585) 924-2420, ext. 3008
E-mail: mmaxwell@kirtas.com
Contact: Michael Maxwell, Director of Worldwide Sales
Non-destructive, high quality, inexpensive, bound document scanning (on and off-site) of books, journals, magazines, lab notebooks, etc. with OCR and metadata capture capabilities

Luna Imaging, Inc.

3542 Hayden Ave., Bldg. One
Culver City, CA 90232-2413
Phone: 310-452-8370
Fax: 310.452.8389
E-mail: sales@luna-img.com
film and print scanning, direct digital photography, image editing and post-production, on-site services, image studio/workflow consulting

Northern Micrographics
2004 Kramer Street
LaCrosse, Wisconsin 54602
Contact: Tom Ringdahl, tringdahl@normicro.com
scanning from paper or film

Preservation Resources
9 Commerce Way
Bethlehem, PA 18017
Phone: 800-773-7222
or 610-758-8700
Fax: 610.758.9700
Contact: presres@oclc.org
microfilm scanning

Saztec International
6700 Corporate Dr.
Kansas City, MO 64120
Phone: 816-483-6900
Fax: 816.241.4966
text conversion, SGML

Systems Integration Group, Inc.
9701 Philadelphia Court
Building 17, Suite A
Lanham, Maryland 20706
Phone: 301-731-3900
Fax: 301.731.3907
on-site/off-site document scanning, text conversion, SGML

Two Cat Digital, Inc.
14717 Catalina Street
San Leandro, CA 94577
Phone: 510-940-2670
Fax: 510.940.2632
Contact: Howard Brainen, howard@twocatdigital.com
film and print scanning, direct digital photography, image editing, bulk image processing services, automated systems, image databases, on-site services, digital imaging consulting

Suggestions if you've already created your digital surrogates

The following were suggestions for the quality and format of the files already digitized. These were not requirements.

2D images

Formats and compression: In general, you'll probably want to keep a TIFF (Tagged Image File Format, version 5 or 6 with Intel headers) version of the image with lossless compression (ITU 4 for black and white or LZW for grayscale or color) or no compression, but a JPEG compressed image will suffice for contribution to RLG Cultural Materials. Alternatively, PhotoCD images may meet your local needs, and JPEGs can be created from those images for contribution to RLG Cultural Materials.
 

Source Resolution
Black-and-white text and line art 300-600 dpi bitonal
Halftone illustrations 300-400 dpi, 8 bpp or 24bpp
Oversized (e.g., maps or posters) 300 dpi bitonal, 8 bpp or 24 bpp
Manuscript page images 300-400 dpi, 8 bpp (24 bpp for color, tinted, or discolored originals)
35mm photographic negatives or slides
(reverse polarity if negative)
3000 pixels in long dimension, 8 bpp or 24 bpp
Photographic prints and transparencies
(4x5, 6x8, 8x10)
4000-6000 pixels in long dimension, 8 bpp or 24 bpp


Text

Source Quality Format Encoding (optional)
Printed page (OCR or rekey) 99.95% accuracy as compared to original ASCII 7- or 8-bit HTML, XML, SGML, RTF
Compound document, in Portable Document Format  (PDF) Text and images as indicated above  PDF  

Audio and motion

Formats and compression: Any of these are acceptable: Microsoft Wave (.wav), MPEG (.mp3, .mpg, .mpeg), "Audio Video Interleave" for Windows (.avi), QuickTime (.qt, .mov), RealMedia (.rm, .ra, .ram). 
 

Source Quality
Spoken word 11-22 kHz sampling, 16 bit, mono
Music 44.1 kHz sampling, 16 bit, stereo
Video 320x240 30 fps/1.2kbps


We are a worldwide library cooperative, owned, governed and sustained by members since 1967. Our public purpose is a statement of commitment to each other—that we will work together to improve access to the information held in libraries around the globe, and find ways to reduce costs for libraries through collaboration.