** ERROR:IncludeAsset Error locating asset.
|
OCLC Digitization services frequently asked questions
OCLC consulting services
- Does OCLC offer consulting services?
Yes, if you are undertaking a digitization project, we can work closely with you to guide you through key decisions in order create a solution that is tailored to the specific needs of your project.
- What role does OCLC play in the grant application process?
We have provided Digitization Services to many institutions over the years whose projects have been funded (in part or in whole) by grants, as well as many institutions whose funding does not derive from grant awards. We do not play any part in the award of grants, but can assist you by generating a budgetary cost estimate as part of the grant
application process.
- How is long distance consultation and service handled?
We’d be happy to provide initial phone consulting services to assist you if you’re considering a digitization project. If these discussions lead to larger consulting requirements (onsite collection survey, etc.), we can provide price estimates for fee-based consultation.
Upon contract award, we assign a Project Manager to your project. Materials are shipped between OCLC’s facility and your site via FedEx, UPS or another designated shipping agency. Traditionally, you pay to ship materials to us and we pay to ship them back to you, but other arrangements can be made. Project Update calls are set up to monitor progress throughout the project. We handle your materials with great care at our facility, and we take great care in re-packaging the source materials for return shipment. We deliver digital samples electronically for review and approval. Final digital delivery can be done through a variety of means, including DVD, FTP, tape output, external portable hard drives, etc.
Filming and scanning
- Can digitization replace microfilm, or do we need both?
Preservation standards groups today continue to support microfilm as the standard of preservation, not digital. In that respect, digitization cannot be considered as a replacement to microfilm. However, digital objects can replace microfilm as your primary delivery mechanism to your end users. Digital objects are much easier to use, allow for multiple users at once, and can be accessed online. In addition, when OCR (Optical Character Recognition) is performed and the text becomes searchable, more direct access to the content is made available in the digital environment and end users can more easily manipulate the final product (i.e., cut and paste, save to file, etc.). If part of your project goal is to preserve the original source material (as well as making it digitally available), then microfilm creation should be included as one of the milestone deliverables in your project.
Funding and pricing
- What is the cost of OCLC’s Digitization Services? And what is the general price range of a normal project?
It is hard to provide a general price range, since collections and requirements vary so much. Digitization projects, services and costs can be as unique as the collections selected for digitization. While many projects have fundamental similarities (i.e., DPI selection, derivative file creation, source material format, etc.) there are also many characteristics that can make apparently similar projects completely different.
For instance, source material size and condition impact handling and preparation costs; DPI selection and image resolution impact image creation costs; metadata requirements, derivative file choices and image enhancement choices impact post-processing costs. Project pricing can be generated after we understand the nature, size, and condition of your collection; the type of image output you require; the options you require for image description and alteration; and finally, the type of delivery method you intend to use for your digital collections.
- What information is needed to give a time and budget estimate?
There are about a dozen basic questions that will help define your collection and describe your requirements. Based on your responses to that initial question set, several more detailed questions may then also be required. Very detailed project questionnaires are available at http://www.oclc.org/preservation/about/rfps/default.htm. However, if you would prefer to set a time for a call to discuss your project description and requirements, it may be easier to explain some of the decision points necessary and the impact those decisions have on process selection and cost.
Newspaper digitization
- What are the copyright issues related to newspaper digitization?
Newspapers dating from the early decades of the 20th century and earlier are usually exempt from copyright compliance. We provide Digitization Services for materials delivered by our customers, however, maintaining proper copyright compliance is the responsibility of the collection owner, and ultimately, the collection users.
If you are considering a newspaper digitization project, we recommend that you do the copyright compliance research prior to embarking on the project. OCLC is not responsible for monitoring copyright compliance for customers.
Additionally, there are special sections in newspapers that the publisher doesn't own the copyright. In these cases, a publisher can list the general sections of their publication for which they do not hold copyright. Based on this list, collection owners can make several decisions: 1) research whether these sections are still under copyright; 2) contact individual creators for approval; 3) redact those sections from the source so that do not appear in the digital copy; and/or 4) redact those sections from the digital copy so they are not viewable without staff mediation. Sections typically outside a publisher’s copyright may include wire services (Reuters, UPI, AP, etc.), cartoons and article series created by 3rd party agents.
- Should newspapers be microfilmed first?
Preservation standards groups today continue to support microfilm as the standard of preservation, not digital. If part of your project goal is to preserve the original source material (as well as making it digitally available), then microfilm creation should be included as one of the milestone deliverables in your project. The decision to scan first and digitize second, or digitize first and scan second, is made by the institution. Quality loss through degradation can occur with every reproduction step away from the original. However, the amount of degradation suffered when digitizing from OCLC-created microfilm is extremely minimal
- How does newspaper article and page segmentation work?
The main distinction in newspaper processing is between "page-level access" and "article-level segmentation."
Page-level access is intended to be the equivalent of reading a newspaper hardcopy: The reader sees the order of the content and the integrity of each article by looking at the page in its original layout. This is a relatively standard process for newspaper digitization projects. It helps link OCR-ed text files to the proper image, links the images from each page in an issue to each other in proper order, and also provides machine-captured metadata to the METS and ALTO files.
Article segmentation is a special process that "segments" the page into the individual components (article, photo, weather box, etc.) so that in certain delivery systems they may be displayed as a separate digital entity, yet still maintaining a relationship back to the parent page. An example of article segmentation can be found at
http://www.dimemanews.com/cdm4/browse.php. Select the first paper to view and when you "mouse-over" the different articles on the page you will see a yellow highlight. Click on one to see article segmentation in action.
- Has OCLC worked with newspaper clipping files before?
Yes, we have. Clippings collections often require special preparation procedures since they may have been folded in envelopes or vertical files for years or they may have paperclips/staples/glue connecting the different pieces that form the full article. Also, they may have date stamps or other markings over the text that can impact OCR success.
Additionally, article clippings subscription services may have copyright complications that relate to several ownership organizations (as opposed to a single publisher). We have scanned original clippings as well as microfilmed clippings collections, enriching access by metadata and applied OCR.
Processing: microfilm
- Does OCLC offer microfilming services?
Yes, we provide full microfilm services including target creation and placement. We use only polyester film (500-year shelf life), and we also offer a specially developed Polysulfide Treatment process that furthers protects from silver oxidation. We also offer a Print Master storage service whereby we will store your Print Master reels in our climate-controlled vault. This keeps the masters safe and secure, and also allows you to quickly order a replacement Service Copy with minimal shipping.
- How does OCLC handle old microfilm produced 30-40 years ago?
The film used to create image-microfilm is either acetate or polyester. Acetate was developed first and is still available today. Polyester is the film base approved as a preservation standard. Acetate has proved to have several disadvantages - it tends to develop vinegar syndrome (decomposition of the emulsion indicated by a vinegary smell), it scratches more easily, it can become brittle along the edges, and it can crack and split very easily with extended use or abuse. Acetate is and was less expensive than polyester. Because of its quality deficits, we recommend that digitization be done from polyester film. If your film is acetate we can make you polyester copies for a modest fee. This gives us a more dependable roll to work with, and helps you remove the acetate film from your collection. However, regardless of the film used, the quality of a digital image is directly related to the quality of the original source material when it was filmed, and the quality of the filming process deployed at the time.
One quick way to determine if your film is acetate or polyester: acetate film does not allow light to pass through it (when it is completely rolled up), polyester does. Take a roll of film out of the box and hold it (as a circle) between your thumb and forefinger. Look at the middle of the circle (directly at the sprocket hole) and then move your hand overhead so that you can look through it at an overhead light. If you see a dense black circle, it's acetate (and should probably be copied over to polyester). If you see light projecting through, you have polyester.
- What if our microfilm is in fair to poor condition?
If your microfilm is in fair to poor condition, it may relate to the type of film that was used during initial film creation, or it may relate to typical wear and tear of film over time by your end users. If your microfilm contract included the creation of an Archive Master or a Print Master, then the first approach is to use the Print Master for the digitization project (or create a new one from the Archive Master).
OCLC has several film repair services that include creating missing reduction ratio charts, splice and basic repair, duplication from acetate to polyester, and addition of leader/trailer film where missing. In some cases, if the film is beyond repair and no masters remain, the only alternative may be to go back to the source material and start from the beginning. If that is not an option, it may be that the only remaining option is to digitize immediately to prevent further image loss from film deterioration.
Processing: OCR (Optical Character Recognition)
- What file formats are used in OCR (Optical Character Recognition)?
Optical Character Recognition (OCR) is a software process that attempts to make a pass over a text-based image file and re-create a corresponding text file made up of ASCII characters. The resulting ASCII text file can be indexed by different delivery systems to render full-text searching. For textual materials, the standard digital output file is a TIFF image. TIFF is a non-proprietary file format. JPEG2000 (also a standard file format) is a derivative format generated from the original TIFF. In either case, each file represents one page of the original. JPEG2000 includes a compression component that makes the image file smaller in size and thus, more Web-friendly than a TIFF. It is particularly used for oversized materials (including maps and newspapers). PDF is also a derivative output in which the original TIFF images are compressed for Web access and/or bound/bundled into a single portable document that
contains multiple images.
- How do you handle OCR with older and unusual fonts?
We have the ability to OCR "old fashioned typeface" fonts including Fraktur, Old English fonts, fonts with long "s", etc. The success rate for OCR is strong when the type quality is good. However, newsprint often lacks clarity, so the OCR accuracy can be compromised (as with more common fonts). In addition, ornamental fonts or all caps fonts will be recognized with less accuracy.
- How are non-Roman characters handled?
OCLC utilizes ABBYY Finereader for OCR. This software package supports over 150 languages; the non-Roman character sets it supports are Russian and other Cyrillic alphabet languages. We are currently gauging demand for OCR services in Chinese, Japanese, Korean, and Arabic. Transcription services are often considered in these projects, in order to create UNICODE text files for full-text searching.
- We have scanned images; do you offer OCR services for those?
We can provide OCR services as well as derivative file creation (i.e., JPEG2000 or bundled PDFs) for customers who already have TIFFs created. We have ongoing and completed projects for customers who use one facility for TIFF creation and another (OCLC) for post-processing services. Pricing is predicated on the number of TIFF images you have and the processing options you require. If you are creating TIFF images for future processing, it is essential that you develop a file-naming scheme that distinguishes which images come from which source document. A corresponding OCR-ed text file would use the same file-name with an extension of .txt. In this manner, the file-names help link and organize the sibling files for additional processing or delivery system import.
Quality assurance
- What procedures are used in your Quality Assurance?
OCLC’s Preservation Service Center is committed to adhering to high quality management procedures in our production environment. Our standard level of quality checks reflects a balance between efficient, cost effective production and the individual attention to each image that accommodates variations in tonal values and document size. After digitization and image enhancement processes the Digital Quality & Assurance team will perform a 100% quality assurance of all original page images. Each image is viewed to ensure complete capture, alignment, evenness of illumination and consistent rendering of detail throughout the image. Images are examined in a viewing and editing software program that allows for viewing 1:1, zooming beyond 100%, and reduced full-page view. Please note that many quality problems cannot be detected in thumbnail view and our procedure includes reconciling irregularities in the original pagination whenever possible.
CONTENTdm
Master files
- Can OCLC help preserve digital master files?
OCLC provides a service for the storage and maintenance of digital master files. This service is called the Digital Archive, and can be used to store your master TIFF images, while you use derivative image files (like PDF or JPEG2000) to deliver your collections to your end users. More information on this service can be found by contacting your OCLC representative. As for maintaining your master image files locally, it is recommended that you use a duplicate storage process on separate organizational servers. These files should be reviewed (i.e., open sample images using random selection) on a consistent basis to insure the storage medium has not adversely affected the files. Semi-annual review would be one approach. As for actual storage needs, the primary requirement is the hard disk space or network storage device necessary to match the size of your TIFF image set.
|