Compound Objects: Are PDF-based or image-based compound objects preferred?
Last review: April 8, 2011
Description: The decision to convert PDF files to compound objects automatically or to construct compound objects from individual images is influenced by a variety of factors.
Choosing whether to produce compound objects by automatically converting multiple-page PDF files or building them with individual images depends on functionality, best practices and end-user experience. The PDF conversion functionality in CONTENTdm is designed specifically for born-digital documents. If an institution has born-digital documents, such as electronic theses and dissertations, electronic meeting minutes, eJournals, etc., the multiple-page PDF conversion functionality should be used.
When starting with physical materials that need to be scanned, we recommend using TIFF images and creating JPEG2000 derivatives to construct image-based compound objects. If you use the OCR features in the Project Client, this method ensures that full-text searching is available and search-term highlighting is applied to the pages when they are viewed by the end user.
While you can achieve these same features by scanning and converting the materials to PDF files, it requires that the PDF creation tool has OCR functionality. Another consideration is that PDF files tend to be very large, if the item has a large number of pages. You will offer an improved end-user performance for an image-based compound object compared to a PDF-based compound object when there are hundreds of pages. This trade-off may be acceptable for born-digital PDF files because the effort to load them into CONTENTdm is lower than back-converting them to images and processing them in the Project Client as image-based compound objects.