ImageGear .NET v24.14 - Updated
OCR an Image or Document
User Guide > How to Work with... > OCR > How to... > OCR an Image or Document

The recognition process is initiated by the ImGearRecPage.Recognize Method.

The Recognize Method processes the single image associated with the ImGearRecPage Class. The method takes the zone list of the image or if it is empty, automatically calls the page-layout decomposition process (auto-zoning).

Internally this method operates on a bitonal image. If the image is not bitonal or the image despeckle mode setting is enabled, an implicit secondary image conversion step is performed automatically to create a secondary image. This implicit image conversion can be influenced with previous settings of the ImGearRecPreprocessingSettings.SecondaryReductionMode Property and DespeckleMode Property.

The Recognize Method enumerates the zones in the zone list and activates the appropriate recognition modules for them. The recognition modules are given the calculated Character Set information zone by zone.

This section provides information about the following:

OCR a Multi-Page TIFF Image

In this example, the application specifies the output format (Converters.Text.Word2000) for the final output document (MULTIPAG.DOC) with the Format property. Then the multi-page result of the recognition stored in the recognition data file is converted into the requested format with the ImGearRecOutputManager.WriteDocument Method call.

C#
Copy Code
// Initialize the Recognition Engine.
ImGearRecognition igRecognition = new ImGearRecognition();
// Open a FileStream for our output document.
using (FileStream outputStream = new FileStream("outputDoc.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
  // Open a FileStream for our source multi-page image.
  using (FileStream multiPageDocument = new FileStream("Multi-Page.tif", FileMode.Open))
  {
    // Load every page of the multi-page document. Starting at page 0 and loading the range of spaces specified.
    // Since the range is -1, that specifies that all pages shall be loaded. 
    ImGearDocument doc = ImGearFileFormats.LoadDocument(multiPageDocument, 0, -1);
    // Determine the amount of pages in the multi-page image.
    int numPages = ImGearFileFormats.GetPageCount(multiPageDocument, ImGearFormats.UNKNOWN);
    // Recognize each page of the multi-page document and add the results to outputStream.
    for (int pageNumber = 0; pageNumber < numPages; pageNumber++)
    {                     
      // Cast the current page to a raster page and import that page.
      ImGearRecPage igRecPage = igRecognition.ImportPage((ImGearRasterPage)doc.Pages[pageNumber]);
      // Preprocess the page.
      igRecPage.Image.Preprocess();
      // Perform recognition.
      igRecPage.Recognize();
      // Add OCR results to the outputStream.
      igRecognition.OutputManager.WriteDirectText(igRecPage, outputStream);
    }
  }
}
// Dispose of objects we are no longer using.
igRecognition.Dispose();
VB .NET
Copy Code
' Initialize the Recognition Engine.
Dim igRecognition As New ImGearRecognition()
' Open a FileStream for our output document.
Using outputStream As New FileStream("outputDoc.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite) 
  ' Open a FileStream for our source multi-page image.
  Using multiPageDocument As New FileStream("Multi-Page.tif", FileMode.Open)
    ' Load every page of the multi-page document. Starting at page 0 and loading the range of spaces specified
    ' Since the range is -1, that specifies that all pages shall be loaded. 
    Dim doc As ImGearDocument = ImGearFileFormats.LoadDocument(multiPageDocument, 0, -1)
    ' Determine the amount of pages in the multi-page image.
    Dim numPages As Integer = ImGearFileFormats.GetPageCount(multiPageDocument, ImGearFormats.UNKNOWN)
    ' Recognize each page of the multi-page document and add the results to outputStream.
    For pageNumber As Integer = 0 To numPages - 1
      ' Cast the current page to a raster page and import that page.
      Dim igRecPage As ImGearRecPage = igRecognition.ImportPage(DirectCast(doc.Pages(pageNumber), ImGearRasterPage))
      ' Preprocess the page.
      igRecPage.Image.Preprocess()
      ' Perform recognition.
      igRecPage.Recognize()
      ' Add OCR results to the outputStream.
      igRecognition.OutputManager.WriteDirectText(igRecPage, outputStream)
    Next
  End Using
End Using
'Dispose of objects we are no longer using.
igRecognition.Dispose()

OCR a Single-Page Document

C#
Copy Code
// Initialize the Recognition Engine.
ImGearRecognition igRecognition = new ImGearRecognition();
// Open a FileStream for our output document.
using (FileStream outputStream = new FileStream("outputDoc.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
  // Open a FileStream for our source multi-page image.
  using (FileStream inputImage = new FileStream("Multi-Page.tif", FileMode.Open))
  {
    // Load every page of the multi-page document. Starting at page 0 and loading the range of spaces specified
    // Since the range is -1, that specifies that all pages shall be loaded. 
    ImGearDocument doc = ImGearFileFormats.LoadDocument(inputImage, 0, -1);
    // Cast the current page to a raster page and import that page.
    ImGearRecPage igRecPage = igRecognition.ImportPage((ImGearRasterPage)doc.Pages[0]);
    // Preprocess the page.
    igRecPage.Image.Preprocess();
    // Perform recognition.
    igRecPage.Recognize();
    // Add OCR results to the outputStream.
    igRecognition.OutputManager.WriteDirectText(igRecPage, outputStream);
  }
}
// Dispose of objects we are no longer using.
igRecognition.Dispose();
VB .NET
Copy Code
' Initialize the Recognition Engine.
Dim igRecognition As New ImGearRecognition()
' Open a FileStream for our output document.
Using outputStream As New FileStream("outputDoc.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite)
  ' Open a FileStream for our source multi-page image.
  Using inputImage As New FileStream("Multi-Page.tif", FileMode.Open)
    ' Load every page of the multi-page document. Starting at page 0 and loading the range of spaces specified
    ' Since the range is -1, that specifies that all pages shall be loaded. 
    Dim doc As ImGearDocument = ImGearFileFormats.LoadDocument(inputImage, 0, -1)
    ' Cast the current page to a raster page and import that page.
    Dim igRecPage As ImGearRecPage = igRecognition.ImportPage(DirectCast(doc.Pages(0), ImGearRasterPage))
    ' Preprocess the page.
    igRecPage.Image.Preprocess()
    ' Perform recognition.
    igRecPage.Recognize()
    ' Add OCR results to the outputStream.
    igRecognition.OutputManager.WriteDirectText(igRecPage, outputStream)
  End Using
End Using
' Dispose of objects we are no longer using.
igRecognition.Dispose()

OCR a PDF Document

C#
Copy Code
// Initialize support for Pdf Format.
ImGearFileFormats.Filters.Add(ImGearPDF.CreatePDFFormat());
ImGearPDF.Initialize();
// Initialize the Recognition Engine.
ImGearRecognition igRecognition = new ImGearRecognition();
// Open a FileStream for our output document.
using (FileStream outputStream = new FileStream("outputDoc.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
  // This ImGearDocument will hold the PDF in memory.
  ImGearDocument imGearDocument = new ImGearDocument();
  // Open a FileStream for our source PDF.
  using (FileStream multiPageDocument = new FileStream("pdf.pdf", FileMode.Open, FileAccess.Read, FileShare.Read))
  {
    // Load the entire multi-page document into imGearDocument.
    imGearDocument = ImGearFileFormats.LoadDocument(multiPageDocument);
    // Recognize each page of the multi-page document and add the results to outputStream.
    for (int pageNumber = 0; pageNumber < imGearDocument.Pages.Count; pageNumber++)
    {
      // Load page specified by pageNumber
      ImGearPage igPage = imGearDocument.Pages[pageNumber];
      // OCR only works on raster images, so we need to rasterize the page if it's an ImGearVectorPage.
      if (igPage is ImGearVectorPage)
      {
        igPage = (igPage as ImGearVectorPage).Rasterize(24, 300, 300);
      }
      // Cast igPage to a raster page and import that page.
      ImGearRecPage igRecPage = igRecognition.ImportPage((ImGearRasterPage)igPage);
      // Preprocess the page.
      igRecPage.Image.Preprocess();
      // Perform recognition.
      igRecPage.Recognize();
      // Add OCR results to the outputStream.
      igRecognition.OutputManager.WriteDirectText(igRecPage, outputStream);
    }
  }
}
// Dispose of objects we are no longer using.
igRecognition.Dispose();
VB.NET
Copy Code
' Initialize support for Pdf Format.
ImGearFileFormats.Filters.Add(ImGearPDF.CreatePDFFormat())
ImGearPDF.Initialize()
' Initialize the Recognition Engine.
Dim igRecognition As New ImGearRecognition()
' Open a FileStream for our output document.
Using outputStream As New FileStream("outputDoc.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite)
    ' This ImGearDocument will hold the PDF in memory.
    Dim imGearDocument As New ImGearDocument()
    ' Open a FileStream for our source PDF.
    Using multiPageDocument As New FileStream("pdf.pdf", FileMode.Open, FileAccess.Read, FileShare.Read)
        ' Load the entire multi-page document into imGearDocument.
        imGearDocument = ImGearFileFormats.LoadDocument(multiPageDocument)
        ' Recognize each page of the multi-page document and add the results to outputStream.
        For pageNumber As Integer = 0 To imGearDocument.Pages.Count - 1
            ' Load page specified by pageNumber
            Dim igPage As ImGearPage = imGearDocument.Pages(pageNumber)
            ' OCR only works on raster images, so we need to rasterize the page if it's an ImGearVectorPage.
            If TypeOf igPage Is ImGearVectorPage Then
                igPage = TryCast(igPage, ImGearVectorPage).Rasterize(24, 300, 300)
            End If
            ' Cast igPage to a raster page and import that page.
            Dim igRecPage As ImGearRecPage = igRecognition.ImportPage(DirectCast(igPage, ImGearRasterPage))
            ' Preprocess the page.
            igRecPage.Image.Preprocess()
            ' Perform recognition.
            igRecPage.Recognize()
            ' Add OCR results to the outputStream.
            igRecognition.OutputManager.WriteDirectText(igRecPage, outputStream)
        Next
    End Using
End Using
' Dispose of objects we are no longer using.
igRecognition.Dispose()

See Also