Use the ImGearRecOutputManager class to produce a searchable PDF file from recognition output. Set the OutputManager.Format to "Converters.Text.PDF" prior to calling OutputManager.WriteDocument(). This converter is heavily reliant on recognized character locations.

C#	Copy Code
igRecognition.OutputManager.Format = "Converters.Text.PDF"; igRecognition.OutputManager.WriteDocument(igRecDocument, "OCR-Converters.Text.PDF.pdf");

VB.NET	Copy Code
igRecognition.OutputManager.Format = "Converters.Text.PDF" igRecognition.OutputManager.WriteDocument(igRecDocument, "OCR-Converters.Text.PDF.pdf")

The ImageGear Recognition API also provides specialized formats for saving PDF documents:

Export to Image Over Text PDF
Export to PDF with Image Substitutions
Export to Edited PDF

Export to Image Over Text PDF

This specialization is suitable for indexing or archiving purposes. The original image is used as foreground with recognized text hidden behind. To use, set the OutputManager.Format to "Converters.Text.PDFImageOnText" prior calling OutputManager.WriteDocument().

CS
VBNET

C#	Copy Code
igRecognition.OutputManager.Format = "Converters.Text.PDFImageOnText"; igRecognition.OutputManager.WriteDocument(igRecDocument, "OCR-Converters.Text.PDFImageOnText.pdf");

VB.NET	Copy Code
igRecognition.OutputManager.Format = "Converters.Text.PDFImageOnText" igRecognition.OutputManager.WriteDocument(igRecDocument, "OCR-Converters.Text.PDFImageOnText.pdf")

Export to PDF with Image Substitutions

This specialization is suitable for low confidence recognition output. Low confidence text is hidden behind cut-outs from the source image. To use, set the OutputManager.Format to "Converters.Text.PDFImageSubst" prior calling OutputManager.WriteDocument().

CS
VBNET

C#	Copy Code
igRecognition.OutputManager.Format = "Converters.Text.PDFImageSubst"; igRecognition.OutputManager.WriteDocument(igRecDocument, "OCR-Converters.Text.PDFImageSubst.pdf");

VB.NET	Copy Code
igRecognition.OutputManager.Format = "Converters.Text.PDFImageSubst" igRecognition.OutputManager.WriteDocument(igRecDocument, "OCR-Converters.Text.PDFImageSubst.pdf")

Export to Edited PDF

This specialization is suitable for introducing large sections of new text into the output PDF document. To use, set the OutputManager.Format to "Converters.Text.PDFEdited" prior to calling OutputManager.WriteDocument(). This format does not rely on the location of recognized characters and is suitable for when edits are made to recognition results.

CS
VBNET

C#	Copy Code
igRecognition.OutputManager.Format = "Converters.Text.PDFEdited"; igRecognition.OutputManager.WriteDocument(igRecDocument, "OCR-Converters.Text.PDFEdited.pdf");

VB.NET	Copy Code
igRecognition.OutputManager.Format = "Converters.Text.PDFEdited" igRecognition.OutputManager.WriteDocument(igRecDocument, "OCR-Converters.Text.PDFEdited.pdf")

An Alternative Technique

Unlike exporting to other output formats that use the ImGearRecOutputManager class, exporting recognition results to a PDF can also be accomplished by using methods of the ImGearRecPage object directly.

This PDF export feature requires a license the enables the PDF format.

The ImGearRecPage.CreatePDFPage method will create a new PDF page and add the recognized text, and optionally the original input image, to it. The new page is then appended to the new or existing PDF document specified in the first parameter.

Or, if a PDF page already exists, the ImGearRecPage.PopulatePDFPage method can populate the page with the recognized data and/or original input image. Any existing content of the PDF page is preserved.

Once the recognized data has been added to the PDF document by either of the above methods, the ImGearPDFDocument.Save method can be called to write the PDF document to a file.

Both the CreatePDFPage and PopulatePDFPage methods provide an ImGearRecPDFOutputOptions class object parameter to adjust settings of the output PDF document. These settings allow the caller to show or hide the added text and/or image, indicate a compression to use for images (subject to PDF requirements), specify the PDF fonts to use for various types of text, and select whether to add text to the PDF using Windows ANSI or Unicode encoding.

If JPEG compression is used for compressing images, the compression quality can be adjusted using the global ImageGear filter parameters. The Quality and DecimationType values will be used for JPEG compression of images in the recognition PDF output. All other JPEG filter parameters are ignored. See the JPEG format topic for more information about these parameters. See the Loading topic for information on how to set global filter parameters.

The ImageGear Recognition API is able to export images to the final output document for graphics zones. Images obtained from graphic zones are stored internally, and inserted to the final output document if the output format and level allow this.

CS
VBNET

C#	Copy Code
using (FileStream content = new FileStream("test1.tif", FileMode.Open)) { ImGearPage igPage = ImGearFileFormats.LoadPage(content, 0); // Import image into Recognition engine ImGearRecPage recPage = igRecognition.ImportPage((ImGearRasterPage)igPage); recPage.Image.Preprocess(); recPage.Recognize(); // Create and set PDF Output options; Text-only PDF ImGearRecPDFOutputOptions options = new ImGearRecPDFOutputOptions(); options.VisibleImage = false; options.VisibleText = true; // Initialize PDF Assembly for export ImGearFileFormats.Filters.Add(ImGearPDF.CreatePDFFormat()); ImGearPDF.Initialize(); // Export page to PDF using (ImGearPDFDocument pdfDocument = new ImGearPDFDocument()) { recPage.CreatePDFPage(pdfDocument, options); pdfDocument.Save("test1.pdf", ImGearSavingFormats.PDF_UNCOMP, 0, 0, (int)ImGearPDFPageRange.ALL_PAGES, ImGearSavingModes.OVERWRITE); } recPage.Dispose(); }

Copy Code

using (FileStream content = new FileStream("test1.tif", FileMode.Open))
{
    ImGearPage igPage = ImGearFileFormats.LoadPage(content, 0);
    // Import image into Recognition engine
    ImGearRecPage recPage = igRecognition.ImportPage((ImGearRasterPage)igPage);
    recPage.Image.Preprocess();
    recPage.Recognize();
    // Create and set PDF Output options; Text-only PDF
    ImGearRecPDFOutputOptions options = new ImGearRecPDFOutputOptions();
    options.VisibleImage = false;
    options.VisibleText = true;
    // Initialize PDF Assembly for export
    ImGearFileFormats.Filters.Add(ImGearPDF.CreatePDFFormat());
    ImGearPDF.Initialize();
    // Export page to PDF
    using (ImGearPDFDocument pdfDocument = new ImGearPDFDocument())
    {
        recPage.CreatePDFPage(pdfDocument, options);
        pdfDocument.Save("test1.pdf", ImGearSavingFormats.PDF_UNCOMP,
            0, 0, (int)ImGearPDFPageRange.ALL_PAGES, ImGearSavingModes.OVERWRITE);
    }
    recPage.Dispose();
}

VB.NET	Copy Code
Using content As New FileStream("test.tif", FileMode.Open, FileAccess.Read) Dim igPage As ImGearPage = ImGearFileFormats.LoadPage(content, 0) ' Import the page into the recognition engine Dim recPage As ImGearRecPage = igRecognition.ImportPage(DirectCast(igPage, ImGearRasterPage)) recPage.Image.Preprocess() recPage.Recognize() ' Create and set PDF Output options; Text-only PDF Dim options As New ImGearRecPDFOutputOptions() Options.VisibleImage = False Options.VisibleText = True ' Initialize PDF Assembly for export ImGearFileFormats.Filters.Insert(0, ImGearPDF.CreatePDFFormat()) ImGearPDF.Initialize() ' Export page to PDF Using pdfDocument As ImGearPDFDocument = New ImGearPDFDocument() recPage.CreatePDFPage(pdfDocument, options) pdfDocument.Save("test1.pdf", ImGearSavingFormats.PDF_UNCOMP, 0, 0, CInt(ImGearPDFPageRange.ALL_PAGES), ImGearSavingModes.OVERWRITE) End Using End Using

PDF/A Support

If your PDF documents require conformance to the PDF/A-1a standard, the ImGearRecPDFOutputOptions class provides a setting to make it easy to produce PDF output that can be easily and automatically converted to PDF/A later with the PDF Converter. When the ImGearRecPDFOutputOptions.OptimizeForPdfa property is set to True, Recognition will automatically produce the necessary information in the output PDF to satisfy much of the PDF/A-1a requirements. The following is an overview on what exactly will be done:

Logical Structure and Tagging: This is one of the most difficult requirements to satisfy automatically for achieving PDF/A-1a compliance. The logical structure, combined with tagging, describes the content (e.g. text and images) of the document and how it is logically arranged. More information about this topic can be found in the PDF Reference, Section 10.6 Logical Structure.

Because of the page decomposition that occurred during the Recognition process, much information about the logical structure of a page is already understood. This information allows Recognition to automatically create the logical structure and tags for the PDF content that is produced by it.
The following hierarchy is followed by Recognition to create the logical structure. Recognition will also safely handle the case where PDF content is appended to an existing PDF document that already contains a logical structure.

\Document
\OCR_Page (Part) – References the PDF page where OCR was performed.
\OCR_Zone (Div) – References all recognized text in a single OCR zone.
\OCR_Image (Figure) – References the original image used to perform OCR.

Fonts: The PDF/A-1 standards require that all fonts used by visible text in a PDF document be fully embedded. Furthermore, any text in the document, hidden or visible, must be able to be mapped to Unicode values. Recognition will satisfy both of these requirements by embedding fonts that are used by visible text and creating a ToUnicode table for all fonts that are used in the document.

When creating Searchable Text PDFs consisting of hidden text aligned under the original image, any fonts used in this case will not be embedded because the text is not visible.

It is important to note that the PDF exported from ImageGear Recognition will not be fully PDF/A compliant. Use the ImageGear PDF Converter to automatically create a fully compliant PDF/A document. This process is demonstrated in the example below.

Example: Exporting to a Text-Only PDF/A Document

CS
VBNET

C#	Copy Code
using (FileStream content = new FileStream("test1.tif", FileMode.Open)) { ImGearPage igPage = ImGearFileFormats.LoadPage(content, 0); // Import image into Recognition engine ImGearRecPage recPage = igRecognition.ImportPage((ImGearRasterPage)igPage); recPage.Image.Preprocess(); recPage.Recognize(); // Create and set PDF Output options; Text-only PDF ImGearRecPDFOutputOptions options = new ImGearRecPDFOutputOptions(); options.VisibleImage = false; options.VisibleText = true; options.OptimizeForPdfa = true; // Initialize PDF Assembly for export ImGearFileFormats.Filters.Add(ImGearPDF.CreatePDFFormat()); ImGearPDF.Initialize(); // Export page to PDF using (ImGearPDFDocument pdfDocument = new ImGearPDFDocument()) { recPage.CreatePDFPage(pdfDocument, options); // Convert PDF to full PDF/A-1a compliance ImGearPDFPreflightConvertOptions options = new ImGearPDFPreflightConvertOptions(ImGearPDFPreflightProfile.PDFA_1A_2005, 0, -1); ImGearPDFPreflight preflight = new ImGearPDFPreflight(pdfDocument); preflight.Convert(options); pdfDocument.Save("test1.pdf", ImGearSavingFormats.PDF_UNCOMP, 0, 0, (int)ImGearPDFPageRange.ALL_PAGES, ImGearSavingModes.OVERWRITE); } recPage.Dispose(); }

Copy Code

using (FileStream content = new FileStream("test1.tif", FileMode.Open))
{
    ImGearPage igPage = ImGearFileFormats.LoadPage(content, 0);
    // Import image into Recognition engine
    ImGearRecPage recPage = igRecognition.ImportPage((ImGearRasterPage)igPage);
    recPage.Image.Preprocess();
    recPage.Recognize();
    // Create and set PDF Output options; Text-only PDF
    ImGearRecPDFOutputOptions options = new ImGearRecPDFOutputOptions();
    options.VisibleImage = false;
    options.VisibleText = true;
    options.OptimizeForPdfa = true;
    // Initialize PDF Assembly for export
    ImGearFileFormats.Filters.Add(ImGearPDF.CreatePDFFormat());
    ImGearPDF.Initialize();
    // Export page to PDF
    using (ImGearPDFDocument pdfDocument = new ImGearPDFDocument())
    {
        recPage.CreatePDFPage(pdfDocument, options);
        // Convert PDF to full PDF/A-1a compliance
        ImGearPDFPreflightConvertOptions options = new ImGearPDFPreflightConvertOptions(ImGearPDFPreflightProfile.PDFA_1A_2005, 0, -1);
        ImGearPDFPreflight preflight = new ImGearPDFPreflight(pdfDocument);
        preflight.Convert(options);
        pdfDocument.Save("test1.pdf", ImGearSavingFormats.PDF_UNCOMP,
            0, 0, (int)ImGearPDFPageRange.ALL_PAGES, ImGearSavingModes.OVERWRITE);
    }
    recPage.Dispose();
}