Using the Recognition component, text searchable PDFs can be created using recognition data to further compress the image data. The image is compressed differently depending on whether it is detected as text or image, allowing for smaller file sizes. Highly compressing PDFs works best on scanned documents.
The process for saving a highly compressed PDF will generally be:
- Open or Scan an image.
- Import the image into the recognition component.
- Recognize the image.
- Export the recognition data to a PDF document.
- Save the PDF document.
Open or Scan
The Loading Images section provides help on how to load an image. Alternatively, an image could be scanned in using TWAIN or ISIS.
Recognize
Before an image can be used with the highly compressed PDF functionality, it must be recognized with the Recognition component. IG_REC_image_import can be used to import an image into the recognition component, followed by IG_REC_image_recognize to perform the recognition. More details are in the Optical Character Recognition section.
Zone data generated during the recognition project is used while compressing the PDF. Any changes to the zone information through either using manual zones or other zone options will cause changes in how the PDF is compressed. This can result in better or worse results. If manually zoning an image, take care to mark any picture data with the IG_REC_WT_GRAPHIC zone type.
Export
A new PDF document can be created with IG_mpi_create and IG_PDF_doc_create. Then use the Recognition IG_REC_PDF_page_create function to create a new PDF page for each page you want in the document. When using it, the key step to enabling the Highly Compressed PDF functionality is to specify the lpOptions parameter. In the AT_REC_PDF_PAGE_OPTIONS structure, set the SegmentImage field to TRUE. It is also important to set VisibleImage to TRUE, and VisibleText to FALSE. The function will create a new PDF page at the end of the document.
Save
After the PDF document is created and pages created, the function IG_mpi_file_save can be used to save the PDF document to disk. Make sure to use IG_FORMAT_PDF for the nFormat parameter.