ImageGear .NET v25.0 - Updated
Access and Analyze OCR Output
User Guide > How to Work with... > OCR > How to... > Access and Analyze OCR Output

After recognition of an ImGearOCRPage is performed, the recognition results are available. The ImageGear OCR API allows the retrieval of recognized data in three different formats. This section discusses the output formats in following subsections:

Simple Text

After ImGearOCRPage.Recognize is performed the recognition result is ready. The simplest way to get this result is the ImGearOCRPage.Text property. This property’s value is the full recognized text on the given page. The following example illustrates how to use this property.

C#
Copy Code
public static string GetRecognizedText(ImGearRasterPage rasterPage)
{
    string resultString = null;
    // Initialization of ImGearOCR by default.
    using (ImGearOCR igOcr = ImGearOCR.Create())
    {
        // Import ImageGear page to recognition repository.
        using (ImGearOCRPage igOcrPage = igOcr.ImportPage(rasterPage))
        {
            // Recognize image.
            igOcrPage.Recognize();
            // Get result text.
            resultString = igOcrPage.Text;
        }
    }
    return resultString;
}
VB.NET
Copy Code
Public Shared Function GetRecognizedText(ByVal rasterPage As ImGearRasterPage) As String
    Dim resultString As String = Nothing
    ' Initialization of ImGearOCR by default.
    Using igOcr As ImGearOCR = ImGearOCR.Create()
        ' Import ImageGear page to recognition repository.
        Using igOcrPage As ImGearOCRPage = igOcr.ImportPage(rasterPage)
            ' Recognize image.
            igOcrPage.Recognize()
            ' Get result text.
            resultString = igOcrPage.Text
        End Using
    End Using

    Return resultString
End Function

PDF Document

Another format of recognition results is PDF Document output. Unlike text output, PDF document output allows you to place recognized words on the same places as they appeared on the source image. PDF Document output may be obtained using ImGearOCRPage.CreatePDFPage. This method appends formatted recognized text as a page to the given PDF Document.

The following sample illustrates how to recognize an ImageGear page to a PDF Document.

C#
Copy Code
public static void RecognizeToPDF(ImGearRasterPage rasterPage, ImGearPDFDocument pdfDocument)
 {
     // Initialization of ImGearOCR by default.
     using (ImGearOCR igOcr = ImGearOCR.Create())
     {
         // Import ImageGear page to recognition repository.
         using (ImGearOCRPage igOcrPage = igOcr.ImportPage(rasterPage))
         {
             // Recognize image.
             igOcrPage.Recognize();

            // Store result to PDF Document with default options.
             igOcrPage.CreatePDFPage(pdfDocument, null);
         }
     }
 }
VB.NET
Copy Code
Public Shared Sub RecognizeToPDF(ByVal rasterPage As ImGearRasterPage, ByVal pdfDocument As ImGearPDFDocument)
     ' Initialization of ImGearOCR by default.
     Using igOcr As ImGearOCR = ImGearOCR.Create()
         ' Import ImageGear page to recognition repository.
         Using igOcrPage As ImGearOCRPage = igOcr.ImportPage(rasterPage)
             ' Recognize image.
             igOcrPage.Recognize()
             ' Store result to PDF Document with default options.
             igOcrPage.CreatePDFPage(pdfDocument, Nothing)
         End Using
     End Using
End Sub

For information about PDF output options, see ImGearOCRPDFOutputOptions class reference.

List of Letters

A letter is a structure which describes the properties of a recognized symbol. Letter output is a low level of recognized data. Each letter consists of the rectangle of the recognized symbol, the character code of the recognized symbol with its confidence, and a list of alternative characters that may be used instead of recognized code.

The ImGearOCRLetter.Confidence property defines the estimated probability of a match between the symbol and the recognized code. The ImGearOCRLetter.AlternativeCharacters properties contains a list of strings that may be applied instead of the character code that was recognized. Each alternative is a string because, for example, the combination of symbols “rn” may be recognized as “m”. A set of such alternatives make up the value of this property. Letter output provides a powerful way to use other dictionaries that are not included to the standard recognition set.

The following example illustrates how to enumerate letters of recognized data.

C#
Copy Code
public static void EnumerateLetterOutput(ImGearRasterPage rasterPage)
 {
     // Initialization of ImGearOCR by default.
     using (ImGearOCR igOcr = ImGearOCR.Create())
     {
         // Import ImageGear page to recognition repository.
         using (ImGearOCRPage igOcrPage = igOcr.ImportPage(rasterPage))
         {
             // Recognize image.
             igOcrPage.Recognize();

            // Store result to PDF Document with default options.
             foreach(ImGearOCRLetter letter in igOcrPage.GetLetters())
             {
                 // Use letter for processing here.
                 // ...........
             }
         }
     }
 }
VB.NET
Copy Code
Public Shared Sub EnumerateLetterOutput(ByVal rasterPage As ImGearRasterPage)
    ' Initialization of ImGearOCR by default.
    Using igOcr As ImGearOCR = ImGearOCR.Create()
        ' Import ImageGear page to recognition repository.
        Using igOcrPage As ImGearOCRPage = igOcr.ImportPage(rasterPage)
             ' Recognize image.
             igOcrPage.Recognize()

             For Each letter As ImGearOCRLetter In igOcrPage.GetLetters()
                 ' Use letter for processing here.
                 ' ...........
             Next
        End Using
    End Using
End Sub