ImageGear for .NET
Walkthrough: Multi-Threaded Recognition Using the .NET Parallel Class

The ImageGear Recognition assembly provides you with the ability to perform recognition activities on multiple images in parallel. However, you will want to ensure that the amount of memory being consumed by this process does not exceed the limitations of the system. This can occur if, for example, all pages of a large PDF are opened at once and then sent to the Recognition assembly for processing.

An alternative to the “all-at-once” processing style mentioned above is a processing technique where smaller chunks of images are opened, processed, exported and closed in an assembly line style. This technique ensures that only a specified number of images are opened and being processed at one time, keeping a consistent and manageable memory footprint throughout the process.

This walkthrough describes this technique using the System.Threading.Tasks.Parallel class included in the .NET 4.0 Framework. In this walkthrough, you will create a .NET 4 windows application that processes all pages of a PDF file while ensuring a controlled memory footprint throughout the task.

Creating the Project

This section describes how to create the project for this sample:

  1. Start Visual Studio and create a new Windows Forms Application project in C# named ParallelSample.
  2. In Visual Studio, add the following ImageGear references to your project (note: assembly versions may differ):

Creating the Page Processor Class

This section describes how to create the class that will perform the parsing and recognition of the PDF file using multiple threads:

  1. In Visual Studio, add a new class to the ParallelSample project called PageProcessorTest.
  2. Add the following using statements to the top of the class file. This code imports the proper types for use in the class.
C# Example
Copy Code
using System;
using System.IO;
using System.Threading.Tasks;
using ImageGear.Core;
using ImageGear.Formats;
using ImageGear.Formats.PDF;
using ImageGear.Recognition;

VB.NET Example
Copy Code
Imports System
Imports System.IO
Imports System.Threading.Tasks
Imports ImageGear.Core
Imports ImageGear.Formats
Imports ImageGear.Formats.PDF
Imports ImageGear.Recognition
  1. Add the following code to initialize a new instance of the class. This code sets up licensing, initializes the ImageGear Formats assembly and adds the required format filters. In this sample, we are only supporting the PDF and PostScript formats:
C# Example
Copy Code
public PageProcessorTest()
{
    //***The SetSolutionName, SetSolutionKey and possibly the SetOEMLicenseKey 
    //methods must be called to distribute the runtime.***
    //ImGearLicense.SetSolutionName("YourSolutionName");
    //ImGearLicense.SetSolutionKey(12345, 12345, 12345, 12345);
    //Manually Reported Runtime licenses also require the following method 
    //call to SetOEMLicenseKey.
    //ImGearLicense.SetOEMLicenseKey("2.0.AStringForOEMLicensing...");
    ImGearCommonFormats.Initialize();
    ImGearFileFormats.Filters.Insert(0, ImGearPDF.CreatePDFFormat());
    ImGearFileFormats.Filters.Insert(0, ImGearPDF.CreatePSFormat());
    ImGearPDF.Initialize();
}

 

VB.NET Example
Copy Code
Public Sub Initialize()
 '***The SetSolutionName, SetSolutionKey and possibly the SetOEMLicenseKey 
 'methods must be called to distribute the runtime.***
 'ImGearLicense.SetSolutionName("YourSolutionName")
 'ImGearLicense.SetSolutionKey(12345, 12345, 12345, 12345)
 'Manually Reported Runtime licenses also require the following method 
 'call to SetOEMLicenseKey. 
 'ImGearLicense.SetOEMLicenseKey("2.0.AStringForOEMLicensing..."); ImGearCommonFormats.Initialize ()
 ImGearFileFormats.Filters.Insert(0, ImGearPDF.CreatePDFFormat())
 ImGearFileFormats.Filters.Insert(0, ImGearPDF.CreatePSFormat())
 ImGearPDF.Initialize()
End Sub
  1. Add the following code to create a private method that processes a chunk of pages. This code works on an array of ImGearRecPage objects to pre-process and recognize them in parallel. After each page has been processed, the results are added to the output document and the pages used for recognition are disposed to free up memory for the next chunk.
C# Example
Copy Code
private void ProcessPageChunk(ImGearRecPage[] recPagesChunk, ImGearRecDocument document)
{
    Parallel.ForEach(recPagesChunk, pg =>
    {
        if (pg != null)
        {
            pg.Image.Preprocess();
            pg.Recognize();
        }
    });
    for (int i = 0; i < recPagesChunk.Length; i++)
    {
        if (recPagesChunk[i] != null)
        {
            document.InsertPage(recPagesChunk[i], -1);
            // Dispose of the rec page from this last chunk. It has
            // already been added to an output document at this point
            // so the recognized page data will be included in the
            // output document.
            // Processing the document in "chunks" also allows the
            // application to maintain a predictable memory footprint.
            recPagesChunk[i].Dispose();
            recPagesChunk[i] = null;
        }
    }
}
VB.NET Example
Copy Code
Private Sub ProcessPageChunk(ByRef recPagesChunk As ImGearRecPage(), ByRef document As ImGearRecDocument)
 Parallel.ForEach(recPagesChunk, Sub(pg)
          If pg IsNot Nothing Then
           Dim recPage As ImGearRecPage = DirectCast(pg, ImGearRecPage)
           If recPage IsNot Nothing Then
            recPage.Image.Preprocess()
            recPage.Recognize()
           End If
          End If
         End Sub)
 For i As Integer = 0 To recPagesChunk.Length - 1
  If recPagesChunk(i) IsNot Nothing Then
   document.InsertPage(recPagesChunk(i), -1)
   ' Dispose of the rec page from this last chunk. It has
   ' already been added to an output document at this point
   ' so the recognized page data will be included in the
   ' output document.
   ' Processing the document in "chunks" also allows the
   ' application to maintain a predictable memory footprint.
   recPagesChunk(i).Dispose()
   recPagesChunk(i) = Nothing
  End If
 Next
End Sub

 

  1. Add the following code to create a public method called Process. This code initializes the ImageGear Recognition assembly, creates the output document and calls the ProcessPageChunk method created in Step 3 (above) for each chunk until all pages are processed. In this example, a maximum of 4 pages will be processed in parallel. This value can change in your situation depending upon the number of cores and threads your CPU supports.
C# Example
Copy Code
public void Process(FileInfo file)
{
    string xmlFile = "output.xml";
    int numberOfCores = 4;
    using (var igRecognition = new ImGearRecognition())
    {
        using (var document = igRecognition.OutputManager.CreateDocument(null))
        {
            using (var content = new FileStream(file.FullName, FileMode.Open, FileAccess.Read))
            {
                int numberOfPages = ImGearFileFormats.GetPageCount(content, ImGearFormats.UNKNOWN);
                var recPagesChunk = new ImGearRecPage[numberOfCores];
                for (int i = 0; i < numberOfPages; i++)
                {
                    // Index to track the current index within the smaller
                    // chunk of pages
                    int chunkIndex = i % numberOfCores;
                    ImGearPage igPage = ImGearFileFormats.LoadPage(content, i);
                    // Rasterize the page if it's a vector page
                    if (igPage is ImGearVectorPage)
                    {
                        ImGearPage tempPage = ((ImGearVectorPage)igPage).Rasterize();
                        if (igPage is IDisposable)
                        {
                            (igPage as IDisposable).Dispose();
                        }
                        igPage = tempPage;
                    }
                    recPagesChunk[chunkIndex] = igRecognition.ImportPage((ImGearRasterPage)igPage);
                    if (((i != 0) && (chunkIndex == numberOfCores - 1)) ||
                        (i == numberOfPages - 1))
                    {
                        ProcessPageChunk(recPagesChunk, document);
                    }
                }
            }
            igRecognition.OutputManager.CodePage = "UTF-8";
            igRecognition.OutputManager.Level = ImGearRecOutputLevel.AUTO;
            igRecognition.OutputManager.Format = "Converters.Text.XML";
            igRecognition.OutputManager.WriteDocument(document, xmlFile);
        }
    }
}

 

VB.NET Example
Copy Code
Public Sub Process(ByVal file As FileInfo)
 Dim xmlFile As String = "output.xml"
 Dim numberOfCores As Integer = 4
 Using igRecognition As New ImGearRecognition()
  Using document As ImGearRecDocument = igRecognition.OutputManager.CreateDocument(Nothing)
   Using content As New FileStream(file.FullName, FileMode.Open, FileAccess.Read)
    Dim numberOfPages As Int32 = ImGearFileFormats.GetPageCount(content, ImGearFormats.UNKNOWN)
    Dim recPagesChunk(numberOfCores) As ImGearRecPage
    For i As Int32 = 0 To numberOfPages - 1
     ' Index to track the current index within the smaller
     ' chunk of pages
     Dim chunkIndex As Int32 = i Mod numberOfCores
     Dim igPage As ImGearPage = ImGearFileFormats.LoadPage(content, i)
     ' Rasterize the page if it's a vector page
     If TypeOf igPage Is ImGearVectorPage Then
      Dim tempPage As ImGearPage = DirectCast(igPage, ImGearVectorPage).Rasterize()
      If TypeOf igPage Is IDisposable Then
       DirectCast(igPage, IDisposable).Dispose()
      End If
      igPage = tempPage
     End If
     recPagesChunk(chunkIndex) = igRecognition.ImportPage(DirectCast(igPage, ImGearRasterPage))
     If ((i <> 0 AndAlso chunkIndex = numberOfCores - 1) Or i = numberOfPages - 1) Then
      ProcessPageChunk(recPagesChunk, document)
     End If
    Next
   End Using
   igRecognition.OutputManager.CodePage = "UTF-8"
   igRecognition.OutputManager.Level = ImGearRecOutputLevel.AUTO
   igRecognition.OutputManager.Format = "Converters.Text.XML"
   igRecognition.OutputManager.WriteDocument(document, xmlFile)
  End Using
 End Using
End Sub

Calling the Page Processor Class

This section describes how to call the PageProcessorTest class that you created above:

  1. Create your own User Interface to enable the user to select a PDF file to process. This example assumes that a file has been selected and valid filename is available.
  2. Add the following code to call the PageProcessorTest class you created. 
C# Example
Copy Code
PageProcessorTest processor = new PageProcessorTest();
processor.Process(new System.IO.FileInfo(filename));

 

VB.NET Example
Copy Code
Dim processor As New PageProcessorTest()
processor.Initialize()
processor.Process(New System.IO.FileInfo(filename))

 

 


©2014. Accusoft Corporation. All Rights Reserved.

Send Feedback