Using OCR

CONTENTdm provides an extension that enables the Project Client to generate file transcripts by using Optical Character Recognition (OCR). This allows the text characters in an image file to be searched.

Additionally, when an end user searches for a term generated by the OCR processing, either with a general search or within a compound object, the search term is highlighted in the image. (Search term highlighting is not supported for Hebrew, Chinese, Japanese, and Korean.)

For compound objects, the OCR extension also provides an option to create a PDF of the entire compound object for ease of printing.

For information about how to use OCR processing on items already in your collection, see Adding OCR to Items in a Collection.

The accuracy of OCR is dependent upon:

  • The quality of the scan

  • The quality of the original document being scanned

  • Whether the characters being recognized are typewritten, computer-generated, hand printed, or cursive

  • The font face of the typewritten or computer-generated text

  • Whether you use the OCR fast mode option (see OCR Settings)

OCR can be performed on JPEG2000, JPEG, GIF, PNG, and TIFF files.

For information about language support, see Supported Languages.

Learn about:

  1. Activating OCR
  2. OCR Settings
  3. Generating Transcripts Using OCR
  4. Supported Languages
  5. Processing Page Limits