ImageGear PDF v25.2 - Updated
OCR an Image or Document
Developer Guide > How to Work with... > OCR > How to... > OCR an Image or Document

The recognition process is initiated by the ImGearOCRPage.Recognize Method.

The Recognize Method processes the single image associated with the ImGearOCRPage Class. The method takes the zone list of the image, or if it is empty uses the page bounds as a single recognition zone.

The Recognize Method enumerates the zones in the zone list and activates the appropriate recognition modules for them. The recognition modules are given the calculated Character Set information zone by zone.

This section provides information about the following:

OCR a Multi-Page TIFF Image

The following example illustrates how the application loads a document from a file, recognizes its pages, and writes the resulting text to the output text file.

C#
Copy Code
// Read input TIFF document.
ImGearDocument igDocument;
using (Stream stream = new FileStream("Multi-Page.tif", FileMode.Open, FileAccess.Read))
    igDocument = ImGearFileFormats.LoadDocument(stream);

using (igDocument)
using (ImGearOCR igOcr = ImGearOCR.Create())
using (TextWriter textWriter = new StreamWriter("outputDoc.txt"))
{
    // Enumerate pages of the document.
    for (int pageIndex = 0; pageIndex < igDocument.Pages.Count; pageIndex++)
    {
        textWriter.WriteLine("Text of page #{0}", pageIndex);

        // Import page to the recognition page.
        using (ImGearOCRPage ocrPage = igOcr.ImportPage((ImGearRasterPage)igDocument.Pages[pageIndex]))
        {
            // Perform recognition.
            ocrPage.Recognize();

            // Write result to the file.
            textWriter.WriteLine(ocrPage.Text);
        }

        textWriter.WriteLine();
    }
}
VB .NET
Copy Code
Dim igDocument As ImGearDocument
Using stream As Stream = New FileStream("Multi-Page.tif", FileMode.Open, FileAccess.Read)
    igDocument = ImGearFileFormats.LoadDocument(stream)
End Using

Using igDocument

    Using igOcr As ImGearOCR = ImGearOCR.Create()

        Using textWriter As TextWriter = New StreamWriter("outputDoc.txt")

           For pageIndex As Integer = 0 To igDocument.Pages.Count - 1
                textWriter.WriteLine("Text of page #{0}", pageIndex)

                Using ocrPage As ImGearOCRPage = igOcr.ImportPage(CType(igDocument.Pages(pageIndex), ImGearRasterPage))
                    ocrPage.Recognize()
                    textWriter.WriteLine(ocrPage.Text)
                End Using

                textWriter.WriteLine()
            Next
        End Using
    End Using
End Using

OCR a Single-Page Document

The following example illustrates how to load a single page from a PNG file and store the resulting text to a text file.

C#
Copy Code
// Initialize recognition engine by default.
using (ImGearOCR igOcr = ImGearOCR.Create())
{
    // Open PNG image file.
    using (Stream inputStream = new FileStream("Image.png", FileMode.Open, FileAccess.Read))
    {
        // Load document from stream.
        using (ImGearDocument igDocument = ImGearFileFormats.LoadDocument(inputStream))
        {
            // Import single page of document to the recognition page.
            using (ImGearOCRPage ocrPage = igOcr.ImportPage((ImGearRasterPage)igDocument.Pages[0]))
            {
                // Recognize the entire page.
                ocrPage.Recognize();

                // Write all recognized text to the output text file.
                using (TextWriter textWriter = new StreamWriter("output.txt"))
                    textWriter.Write(ocrPage.Text);
            }
        }
    }
}
VB .NET
Copy Code
' Initialize recognition engine by default.
Using igOcr As ImGearOCR = ImGearOCR.Create()
     ' Open PNG image file.
     Using inputStream As Stream = New FileStream("Image.png", FileMode.Open, FileAccess.Read)
          ' Load document from stream.
          Using igDocument As ImGearDocument = ImGearFileFormats.LoadDocument(inputStream)
               ' Import single page of document to the recognition page.
               Using ocrPage As ImGearOCRPage = igOcr.ImportPage(CType(igDocument.Pages(0), ImGearRasterPage))
                    ' Recognize the entire page.
                    ocrPage.Recognize()
                    ' Write all recognized text to the output text file.
                    Using textWriter As TextWriter = New StreamWriter("output.txt")
                                textWriter.Write(ocrPage.Text)
                    End Using
               End Using
          End Using
     End Using
End Using

OCR a PDF Document

The following example illustrates how to recognize all the pages in a PDF document and store the resulting text in a text file.

C#
Copy Code
// Initialize support for Pdf Format.
ImGearFileFormats.Filters.Add(ImGearPDF.CreatePDFFormat());
ImGearPDF.Initialize();

try
{
    // Initialize recognition engine by default.
    using (ImGearOCR igOcr = ImGearOCR.Create())
    {
        // Open input PDF file.
        using (Stream inputStream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
        {
            // Load PDF document.
            using (ImGearPDFDocument igPdfDocument = (ImGearPDFDocument)ImGearFileFormats.LoadDocument(inputStream))
            {
                // Open output text file for writing.
                using (TextWriter textWriter = new StreamWriter("output.txt"))
                {
                    // Enumerate PDF pages of the document for recognition.
                    for(int pageIndex=0; pageIndex<igPdfDocument.Pages.Count; pageIndex++)
                    {
                        // Get a single PDF page from the document by index.
                        using (ImGearPDFPage igPdfPage = (ImGearPDFPage)igPdfDocument.Pages[pageIndex])
                        {
                            // Rasterize PDF page to the raster page with nominal resolution.
                            ImGearRasterPage igRasterPage = igPdfPage.Rasterize(24, 300, 300);

                            // Import page to the recognition page.
                            using (ImGearOCRPage igOcrPage = igOcr.ImportPage(igRasterPage))
                            {
                                // Recognize page.
                                igOcrPage.Recognize();

                                // Store recognized text to a file.
                                textWriter.WriteLine("Text of page #{0}", pageIndex);
                                textWriter.WriteLine(igOcrPage.Text);
                                textWriter.WriteLine();
                            }
                        }
                    }
                }
            }
        }
    }
}
finally
{
    // Terminate PDF in any case.
    ImGearPDF.Terminate();
}
VB.NET
Copy Code
' Initialize support for Pdf Format.
ImGearFileFormats.Filters.Add(ImGearPDF.CreatePDFFormat())
ImGearPDF.Initialize()

Try
    ' Initialize recognition engine by default.
    Using igOcr As ImGearOCR = ImGearOCR.Create()
        ' Open input PDF file.
        Using inputStream As Stream = New FileStream("Input.pdf", FileMode.Open, FileAccess.Read)
            ' Load PDF document.
            Using igPdfDocument As ImGearPDFDocument = CType(ImGearFileFormats.LoadDocument(inputStream), ImGearPDFDocument)
                ' Open output text file for writing.
                Using textWriter As TextWriter = New StreamWriter("output.txt")
                    ' Enumerate PDF pages of document for recognition.
                    For pageIndex As Integer = 0 To igPdfDocument.Pages.Count - 1
                        ' Get single PDF page from document by index.
                        Using igPdfPage As ImGearPDFPage = CType(igPdfDocument.Pages(pageIndex), ImGearPDFPage)
                            ' Rasterize PDF page to the raster page with nominal resolution.
                            Dim igRasterPage As ImGearRasterPage = igPdfPage.Rasterize(24, 300, 300)
                            ' Import page to the recognition page.
                            Using igOcrPage As ImGearOCRPage = igOcr.ImportPage(igRasterPage)
                                ' Recognize page to a text file.
                                igOcrPage.Recognize()

                                ' Store recognized.
                                textWriter.WriteLine("Text of page #{0}", pageIndex)
                                textWriter.WriteLine(igOcrPage.Text)
                                textWriter.WriteLine()
                            End Using
                        End Using
                    Next
                End Using
            End Using
        End Using
    End Using

Finally
    ImGearPDF.Terminate()
End Try

Is this page helpful?
Yes No
Thanks for your feedback.