ImageGear .NET v24.12 - Updated
Export to a Formatted Document
User Guide > How to Work with... > OCR > How to... > Access and Analyze OCR Output > Export to a Formatted Document

The ImageGear Recognition API allows saving recognized data to a number of document formats, such as RTF, Microsoft Office Word, or Excel.

This API group requires ImGearRecLicenseFeature.FormattedOutput to be enabled.

After having successfully recognized the image (or a series of images), create an ImGearRecDocument Class object for accumulating recognized pages and writing them to the final output document. Use ImGearRecDocument.Create static method to create an empty ImGearRecDocument object. Then, use ImGearRecDocument.InsertPage to insert recognized pages to the document. ImGearRecDocument class also allows you to remove, update, or reorder pages. You can also save ImGearRecDocument object into an intermediate file, preserving all of the recognized data, and reopen it later. Use ImGearRecDocument.Save Method and ImGearRecDocument.Open static method, correspondingly.

When a page has been added to the document, the document gets ownership of the recognized data, and the page object becomes invalid. Use ImGearRecPage.IsValid property to check if the page is valid. If you need to re-recognize the image that has been added to a document, re-import it from ImGearPage. You can then recognize it and update the corresponding page in the document, using ImGearRecDocument.UpdatePage.

When all document pages have been recognized, you can output the final document using ImGearRecOutputManager.WriteDocument. The Code Page, format of the final output document, and the level of format retention should be specified beforehand, using the CodePage Property, Format Property, and Level Property. The full list of supported output formats is given in the topic Output Text Format List.

To check whether a requested output format is available in the current Recognition API's configuration, use the Formats Property to examine the list of available output formats.

This topic provides information about the following:

Enumerating the Available Output Text File Formats

C#
Copy Code
string formatList = "";
for (int i = 0; i < igRecognition.OutputManager.Formats.Count; i++)
{
     ImGearRecOutputFormat igRecOutputFormat = igRecognition.OutputManager.Formats[i];
     formatList += "Text Format: " + igRecOutputFormat.Name + Environment.NewLine;
     formatList += "Extension: " + igRecOutputFormat.DefaultFileExtension + Environment.NewLine;
}
System.Console.WriteLine(formatList);
VB .NET
Copy Code
Dim formatList As String = ""
Dim i As Integer = 0
While i < igRecognition.OutputManager.Formats.Count
      Dim igRecOutputFormat As ImGearRecOutputFormat = igRecognition.OutputManager.Formats(i)
      formatList += "Text Format: " + igRecOutputFormat.Name + Environment.NewLine
      formatList += "Extension: " + igRecOutputFormat.DefaultFileExtension + Environment.NewLine
      System.Math.Max(System.Threading.Interlocked.Increment(i), i - 1)
End While
System.Console.WriteLine(formatList)

In some cases the application may need to access the recognized information on a per-character basis. For this structured data output, you can use the GetLetters Method, which provides access to the recognized output of the Recognize Method.

The GetLetters Method can only be called immediately after recognition, so the Recognize Method must be used with one page being processed at a time.

Retaining Format in Final Output Document

If you select an advanced word processor as the output text format for the final output document (e.g., Microsoft Word), you may want the recognition engine to pass as much formatting information as it can collect from a document to make the layout result similar to the original one. Use ImGearRecOutputManager.Level Property to control the formatting level of the output document.

In the final output document, not all possible formatting features can be used, since either they are not applicable or they were not implemented for that particular converter. For example, specifying the page margins in conjunction with a text output converter won't have any effect. The converters implement the various output formatting features as described in the Output converter formatting properties topic.

The ImageGear Recognition API is able to export images to the final output document for graphics zones. Images obtained from graphic zones are stored internally, and inserted to the final output document if the output format and level allow this.