After recognition of an ImGearOCRPage is performed, the recognition results are available. The ImageGear OCR API allows the retrieval of recognized data in three different formats. This section discusses the output formats in following subsections:
After ImGearOCRPage.Recognize is performed the recognition result is ready. The simplest way to get this result is the ImGearOCRPage.Text property. This property’s value is the full recognized text on the given page. The following example illustrates how to use this property.
Another format of recognition results is PDF Document output. Unlike text output, PDF document output allows you to place recognized words on the same places as they appeared on the source image. PDF Document output may be obtained using ImGearOCRPage.CreatePDFPage. This method appends formatted recognized text as a page to the given PDF Document.
The following sample illustrates how to recognize an ImageGear page to a PDF Document.
For information about PDF output options, see ImGearOCRPDFOutputOptions class reference.
A letter is a structure which describes the properties of a recognized symbol. Letter output is a low level of recognized data. Each letter consists of the rectangle of the recognized symbol, the character code of the recognized symbol with its confidence, and a list of alternative characters that may be used instead of recognized code.
The ImGearOCRLetter.Confidence property defines the estimated probability of a match between the symbol and the recognized code. The ImGearOCRLetter.AlternativeCharacters properties contains a list of strings that may be applied instead of the character code that was recognized. Each alternative is a string because, for example, the combination of symbols “rn” may be recognized as “m”. A set of such alternatives make up the value of this property. Letter output provides a powerful way to use other dictionaries that are not included to the standard recognition set.
The following example illustrates how to enumerate letters of recognized data.