ImageGear for C and C++ on Windows v19.10 - Updated
Export to PDF
User Guide > How to Work with... > OCR > How to... > Assess and Analyze OCR Output > Export to PDF

The ImageGear Recognition API allows saving recognized data to PDF documents. Unlike exporting to other output formats that use the Recognition API, exporting recognition results to a PDF is accomplished by using ImageGear Multi-page Document functions. You only need to export the recognized page to PDF page and append this page to an ImageGear multi-page document.

This PDF export feature requires a license that enables the PDF format.

IG_REC_PDF_page_create function will create a new PDF page and add the recognized text, and optionally the original input image, to it. The original image can be segmented and automatically compressed differently based upon the recognized zones. The new page is then appended to the new or existing PDF document specified in the second parameter.

Alternatively, if a PDF page already exists, the IG_REC_PDF_page_populate function can populate the page with the recognized data and/or original input image. Any existing content of the PDF page is preserved.

Once the recognized data has been added to the PDF document by either of the above functions, the IG_mpi_file_save function can be called to write the PDF document to a file.

Both the IG_REC_PDF_page_create and IG_REC_PDF_page_populate functions take AT_REC_PDF_PAGE_OPTIONS parameter to adjust settings of the output PDF document. These settings allow the caller to show or hide the added text and/or image, specify the PDF fonts to use for various types of text, and select whether to add text to the PDF using Windows ANSI or Unicode encoding.

PDF formatted output is not supported for Asian language recognition.

This topic provides information about how to...

Save the Recognized Data Directly in PDF Format

C
Copy Code
AT_ERRCOUNT nErrCount;
HIG_REC_IMAGE hImg;
HIGEAR hIGear;
HMIGEAR hMPDoc;
AT_REC_PDF_PAGE_OPTIONS opt;
nErrCount = IG_load_file("Image.tif", &hIGear );
nErrCount = IG_REC_image_import(hIGear, &hImg);
nErrCount = IG_image_delete(hIGear);
nErrCount = IG_REC_image_preprocess(hImg);
nErrCount = IG_REC_image_recognize(hImg);
// Create an empty PDF document
nErrCount = IG_mpi_create(&hMPDoc, 0);
nErrCount = IG_PDF_doc_create(hMPDoc);
// Set options for IG_REC_PDF_page_create to add an
// image to the page, and segment the image based on
// the recognized zones.
memset(&opt, 0, sizeof(opt));
opt.VisibleImage = TRUE;
opt.VisibleText = FALSE;
opt.SegmentImage = TRUE;
// Create PDF page with recognized text
nErrCount = IG_REC_PDF_page_create(hImg, hMPDoc, &opt);
// Save PDF document
nErrCount = IG_mpi_file_save("ONEPAGE.pdf", hMPDoc, 0, 0, 1, IG_FORMAT_PDF, IG_MPI_SAVE_OVERWRITE);
// Delete the document
nErrCount = IG_mpi_delete(hMPDoc);
// Delete the recognition image
nErrCount = IG_REC_image_delete(hImg);

Save a Highly Compressed PDF

Use the Recognition component to create searchable PDFs comprised of recognized text and images. Optimal compression is chosen for each image based on OCR zone information recovered during recognition, producing smaller PDF files. Highly compressed PDFs are best suited for scanned documents.

Follow these steps to save a highly-compressed PDF:

Load the Image

The Loading Images section provides help on how to load an image. Alternatively, an image could be scanned into memory using the TWAIN and ISIS components.

Recognize the Image

Before an image can be used with the highly compressed PDF functionality, it must be recognized with the Recognition component. First use the IG_REC_image_import function to prepare an image for recognition. Then use IG_REC_image_recognize to generate its recognition data. More details are in the Optical Character Recognition section.

Zone data generated during recognition is used to choose optimal image compression.

Any changes to the recognized zone data—through either using manual zones or other zone options—may adversely affect the final PDF file size.

When manually zoning an image, take care to mark any picture data with the IG_REC_WT_GRAPHIC zone type.

Export Recognition Data to PDF

  1. Create a new PDF document with IG_mpi_create and IG_PDF_doc_create.
  2. Prepare an AT_REC_PDF_PAGE_OPTIONS structure to re-compress the source image and add invisible text into the PDF page:
    • Set SegmentImage to TRUE.
    • Set VisibleImage to TRUE.
    • Set VisibleText to FALSE.
  3. Use the Recognition IG_REC_PDF_page_create function with each recognized page to append a new highly-compressed PDF page to the document.

Save the PDF

After the PDF document is created and pages created, use the function IG_mpi_file_save to save the PDF document to disk. Make sure to use IG_FORMAT_PDF for the nFormat parameter.