Skip to page content

Integrated OCR: facilitating full-text searching

The new CONTENTdm OCR Extension provides the ability for users to integrate optical character recognition (OCR) with collection building. The feature uses ABBYY’s award-winning FineReader OCR software to capture text for addition to searchable metadata fields within CONTENTdm collections. When viewed, items prepared with this feature will display highlighted search terms within the image. Additionally, the OCR Extension provides the option to create a PDF file of an entire compound object for easy printing. Whether applied to select items in a collection, or extensive document archives, the integrated OCR capability makes collection building more efficient.

screen capture of the CONTENTdm interface

Highlighted search terms display in an image when metadata is prepared with the CONTENTdm OCR Extension.

The OCR Extension can be added to any new or existing CONTENTdm 4 license and is included with the purchase of some CONTENTdm license levels.

OCR Extension system requirements

OCR Acquisition Station

Le poste d'acquisition OCR (reconnaissance optique de caractères) nécessite les éléments suivants :

  • Microsoft Windows 2000 Professional ou Windows XP.
  • 32-bit x86 processor (Intel® Pentium® 4 class compatible processor or higher).
  • Microsoft Internet Explorer 6.0.
  • Minimum de 256 Mo de mémoire vide.
  • 100 Mo d'espace disponible sur le disque dur pour l'installation.
  • Résolution d'affichage minimale de 1024 × 768.
  • 128 kbit/s ou connexion plus rapide.
  • Acrobat Reader 7.0 ou version supérieure.