OCR Extension
Integrate OCR with collection building
The CONTENTdm OCR Extension enables you to integrate OCR (Optical Character Recognition) with your digital collection building. The OCR process converts a text-based image file (either a TIFF or JPEG file) to a corresponding ASCII text file, which is then full-text searchable.
Use the OCR Extension to generate full-text transcripts from text-based image files. The OCR Extension can be added to any new or existing CONTENTdm license and is included with the purchase of some license levels.
It also includes support for 184 languages, including Chinese, Japanese, Korean, Greek, Russian and Hebrew, among others.
Make your text-based images full-text searchable
The OCR Extension uses ABBYY’s award-winning FineReader OCR software to capture text for addition to searchable metadata fields within CONTENTdm collections. With this feature, end users’ search words are highlighted in the image when viewed.
![[screen capture]](/content/dam/oclc/contentdm/images/screens/recognition.jpg)
Highlighted search terms display in an image when metadata is prepared with the CONTENTdm OCR Extension.
Create printable PDFs
Additionally, if you want to make printable PDFs available to end users for easy printing, you can choose to generate a PDF of an entire compound object using the OCR Extension. Whether applied to select items in a collection, or extensive document archives, the integrated OCR capability makes collection building more efficient.