ImageGear .NET v25.2 - Updated
Developer Guide / How to Work with... / OCR / How to... / OCR an Image or Document
In This Topic
    OCR an Image or Document
    In This Topic

    The recognition process is initiated by the ImGearOCRPage.Recognize Method.

    The Recognize Method processes the single image associated with the ImGearOCRPage Class. The method takes the zone list of the image, or if it is empty uses the page bounds as a single recognition zone.

    The Recognize Method enumerates the zones in the zone list and activates the appropriate recognition modules for them. The recognition modules are given the calculated Character Set information zone by zone.

    OCR a Multi-Page TIFF Image

    The following example illustrates how the application loads a document from a file, recognizes its pages, and writes the resulting text to the output text file.

    C#
    Copy Code
    // Read input TIFF document.
    ImGearDocument igDocument;
    using (Stream stream = new FileStream("Multi-Page.tif", FileMode.Open, FileAccess.Read))
        igDocument = ImGearFileFormats.LoadDocument(stream);
    
    using (igDocument)
    using (ImGearOCR igOcr = ImGearOCR.Create())
    using (TextWriter textWriter = new StreamWriter("outputDoc.txt"))
    {
        // Enumerate pages of the document.
        for (int pageIndex = 0; pageIndex < igDocument.Pages.Count; pageIndex++)
        {
            textWriter.WriteLine("Text of page #{0}", pageIndex);
    
            // Import page to the recognition page.
            using (ImGearOCRPage ocrPage = igOcr.ImportPage((ImGearRasterPage)igDocument.Pages[pageIndex]))
            {
                // Perform recognition.
                ocrPage.Recognize();
    
                // Write result to the file.
                textWriter.WriteLine(ocrPage.Text);
            }
    
            textWriter.WriteLine();
        }
    }
    
    VB .NET
    Copy Code
    Dim igDocument As ImGearDocument
    Using stream As Stream = New FileStream("Multi-Page.tif", FileMode.Open, FileAccess.Read)
        igDocument = ImGearFileFormats.LoadDocument(stream)
    End Using
    
    Using igDocument
    
        Using igOcr As ImGearOCR = ImGearOCR.Create()
    
            Using textWriter As TextWriter = New StreamWriter("outputDoc.txt")
    
               For pageIndex As Integer = 0 To igDocument.Pages.Count - 1
                    textWriter.WriteLine("Text of page #{0}", pageIndex)
    
                    Using ocrPage As ImGearOCRPage = igOcr.ImportPage(CType(igDocument.Pages(pageIndex), ImGearRasterPage))
                        ocrPage.Recognize()
                        textWriter.WriteLine(ocrPage.Text)
                    End Using
    
                    textWriter.WriteLine()
                Next
            End Using
        End Using
    End Using
    

    OCR a Single-Page Document

    The following example illustrates how to load a single page from a PNG file and store the resulting text to a text file.

    C#
    Copy Code
    // Initialize recognition engine by default.
    using (ImGearOCR igOcr = ImGearOCR.Create())
    {
        // Open PNG image file.
        using (Stream inputStream = new FileStream("Image.png", FileMode.Open, FileAccess.Read))
        {
            // Load document from stream.
            using (ImGearDocument igDocument = ImGearFileFormats.LoadDocument(inputStream))
            {
                // Import single page of document to the recognition page.
                using (ImGearOCRPage ocrPage = igOcr.ImportPage((ImGearRasterPage)igDocument.Pages[0]))
                {
                    // Recognize the entire page.
                    ocrPage.Recognize();
    
                    // Write all recognized text to the output text file.
                    using (TextWriter textWriter = new StreamWriter("output.txt"))
                        textWriter.Write(ocrPage.Text);
                }
            }
        }
    }
    
    VB .NET
    Copy Code
    ' Initialize recognition engine by default.
    Using igOcr As ImGearOCR = ImGearOCR.Create()
         ' Open PNG image file.
         Using inputStream As Stream = New FileStream("Image.png", FileMode.Open, FileAccess.Read)
              ' Load document from stream.
              Using igDocument As ImGearDocument = ImGearFileFormats.LoadDocument(inputStream)
                   ' Import single page of document to the recognition page.
                   Using ocrPage As ImGearOCRPage = igOcr.ImportPage(CType(igDocument.Pages(0), ImGearRasterPage))
                        ' Recognize the entire page.
                        ocrPage.Recognize()
                        ' Write all recognized text to the output text file.
                        Using textWriter As TextWriter = New StreamWriter("output.txt")
                                    textWriter.Write(ocrPage.Text)
                        End Using
                   End Using
              End Using
         End Using
    End Using
    

    OCR a PDF Document

    The following example illustrates how to recognize all the pages in a PDF document and store the resulting text in a text file.

    C#
    Copy Code
    // Initialize support for Pdf Format.
    ImGearFileFormats.Filters.Add(ImGearPDF.CreatePDFFormat());
    ImGearPDF.Initialize();
    
    try
    {
        // Initialize recognition engine by default.
        using (ImGearOCR igOcr = ImGearOCR.Create())
        {
            // Open input PDF file.
            using (Stream inputStream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
            {
                // Load PDF document.
                using (ImGearPDFDocument igPdfDocument = (ImGearPDFDocument)ImGearFileFormats.LoadDocument(inputStream))
                {
                    // Open output text file for writing.
                    using (TextWriter textWriter = new StreamWriter("output.txt"))
                    {
                        // Enumerate PDF pages of the document for recognition.
                        for(int pageIndex=0; pageIndex<igPdfDocument.Pages.Count; pageIndex++)
                        {
                            // Get a single PDF page from the document by index.
                            using (ImGearPDFPage igPdfPage = (ImGearPDFPage)igPdfDocument.Pages[pageIndex])
                            {
                                // Rasterize PDF page to the raster page with nominal resolution.
                                ImGearRasterPage igRasterPage = igPdfPage.Rasterize(24, 300, 300);
    
                                // Import page to the recognition page.
                                using (ImGearOCRPage igOcrPage = igOcr.ImportPage(igRasterPage))
                                {
                                    // Recognize page.
                                    igOcrPage.Recognize();
    
                                    // Store recognized text to a file.
                                    textWriter.WriteLine("Text of page #{0}", pageIndex);
                                    textWriter.WriteLine(igOcrPage.Text);
                                    textWriter.WriteLine();
                                }
                            }
                        }
                    }
                }
            }
        }
    }
    finally
    {
        // Terminate PDF in any case.
        ImGearPDF.Terminate();
    }
    
    VB.NET
    Copy Code
    ' Initialize support for Pdf Format.
    ImGearFileFormats.Filters.Add(ImGearPDF.CreatePDFFormat())
    ImGearPDF.Initialize()
    
    Try
        ' Initialize recognition engine by default.
        Using igOcr As ImGearOCR = ImGearOCR.Create()
            ' Open input PDF file.
            Using inputStream As Stream = New FileStream("Input.pdf", FileMode.Open, FileAccess.Read)
                ' Load PDF document.
                Using igPdfDocument As ImGearPDFDocument = CType(ImGearFileFormats.LoadDocument(inputStream), ImGearPDFDocument)
                    ' Open output text file for writing.
                    Using textWriter As TextWriter = New StreamWriter("output.txt")
                        ' Enumerate PDF pages of document for recognition.
                        For pageIndex As Integer = 0 To igPdfDocument.Pages.Count - 1
                            ' Get single PDF page from document by index.
                            Using igPdfPage As ImGearPDFPage = CType(igPdfDocument.Pages(pageIndex), ImGearPDFPage)
                                ' Rasterize PDF page to the raster page with nominal resolution.
                                Dim igRasterPage As ImGearRasterPage = igPdfPage.Rasterize(24, 300, 300)
                                ' Import page to the recognition page.
                                Using igOcrPage As ImGearOCRPage = igOcr.ImportPage(igRasterPage)
                                    ' Recognize page to a text file.
                                    igOcrPage.Recognize()
    
                                    ' Store recognized.
                                    textWriter.WriteLine("Text of page #{0}", pageIndex)
                                    textWriter.WriteLine(igOcrPage.Text)
                                    textWriter.WriteLine()
                                End Using
                            End Using
                        Next
                    End Using
                End Using
            End Using
        End Using
    
    Finally
        ImGearPDF.Terminate()
    End Try