Skip to page content

Worldwide (English) Change

Digitizing: Scanning processes

Our scanning studio in Bethlehem, Pennsylvania features a dedicated network server, two SunRise microfilm scanners, and three direct scanners including Zeutschel, BetterLight 8K, and Fujitsu. Our studio also includes high-speed Pentium workstations optimized for quality assurance and image processing. And our skilled staff applies the highest standards and quality controls to scanning and related digitization processes.

Below is an overview of our scanning services and processes:

  • Bitonal scanning
    The scanning of most retrospective research materials begins with the choice between black-and-white or grayscale scanning, and in some cases color scanning. Bitonal images are composed of one bit of information per pixel. Each pixel is either black or white. Printed textual materials or simple line art are examples of material best suited to bitonal scanning.

    Bitonal scanning produces smaller files that load and print quickly and can be compressed without any loss of information. However, compromised or damaged materials cannot be fully captured and tonality is lost in illustrations and handwritten documents.

  • Grayscale scanning
    Grayscale scanning retains the tonal value present in the original, including continuous-tone and halftone photographs and illustrations. Grayscale images are composed of eight bits of information per pixel providing 256 shades of gray. Grayscale scanning is optimal for manuscripts, stained material, or documents with heavy bleed through, and often is the only way to capture illustrations and faded text.

    Grayscale scanning captures a wider variety of tonal values which translate to more information from the original document. However, since each pixel has comparatively more information, grayscale files are substantially larger than bitonal images when both are uncompressed. The scanning and processing of grayscale files is also more time consuming than that of bitonal files.

  • Scan resolution
    Microfilm can be scanned at varying resolutions depending on film and content specifications and on client needs. Typical scanning resolutions are 200, 300, 400 and 600 dpi (dots per inch). Resolutions are relative to the original document size and represent true dpi. We recommend achieving the highest possible scanned resolution given the size of the original.
  • Image processing
    Material which is microfilmed two pages per frame, or 2-up, can be split into separate image files during scanning or as a post-scan process. We also offer image processing options including deskewing and despeckling. Custom cropping or image enhancements, such as contrast adjustments, brightness adjustments, and sharpening, are also available.
  • File-naming and directories
    OCLC Preservation Service Centers organize files into directories and name files according to client needs, often reflecting the bibliographic structure of the material, such as issue numbers for serials or OCLC numbers for monographs.

    Each file is named according to a scheme based on either simple sequential numbering or file content, such as page number or special feature codes denoting illustrations, indexes, etc. Filenames and directories can be limited to eight characters to conform to multiple computer platforms.

  • Tagging
    We can include additional descriptive information to TIFF (Tagged Image File Format) files. Standard tags in a TIFF file denote pixel width, pixel length, resolution and compression. Non-standard tags, which OCLC Preservation Service Centers routinely add, identify the image name, source, and creation date. Custom tags can also be added per client specifications.
  • Derivative Access files
    We supply two options that utilize OCR processing derived from the archival TIFF file. They are searchable PDF (Portable Document Format) and Text files (.txt).
  • Optical Character Recognition (OCR)
    The OCLC Preservation Service Center can deliver OCR output to your specifications. We offer various OCR services based on the ABBYY® Finereader software. We use the docWORKS Newspaper Edition software, developed by CCS (Content Conversion Specialists), a global company based in Germany.

    docWORKS is an intelligent software application that can process page images or text files (books, serials, dissertations, technical reports, etc.) and output OCR, METS/ALTO XML, PDF and rich metadata. We also can provide dictionary supported OCR correction of specific elements, e.g., headlines, article title, authors, etc. TEI formatted XML is another option available.

    Utilizing docWORKS, we are able to process newspapers using the NDNP (National Digitization Newspaper Program) requirements, as well as more simplified output. We provide single or bound PDF files; we supply page and/or article level segmentation; and corrected headlines.

    OCLC Preservation Services can import docWORKS XML output into CONTENTdm for collection building or run the CONTENTdm OCR extension as we build your text based CONTENTdm collections.