ImageGear for .NET User Guide > Using ImageGear for .NET > Using ImageGear.Recognition Namespace > The ImageGear Recognition API and Multi-Threading > Walkthrough: Multi-Threaded Recognition Using the .NET Parallel Class |
The ImageGear Recognition assembly provides you with the ability to perform recognition activities on multiple images in parallel. However, you will want to ensure that the amount of memory being consumed by this process does not exceed the limitations of the system. This can occur if, for example, all pages of a large PDF are opened at once and then sent to the Recognition assembly for processing.
An alternative to the “all-at-once” processing style mentioned above is a processing technique where smaller chunks of images are opened, processed, exported and closed in an assembly line style. This technique ensures that only a specified number of images are opened and being processed at one time, keeping a consistent and manageable memory footprint throughout the process.
This walkthrough describes this technique using the System.Threading.Tasks.Parallel class included in the .NET 4.0 Framework. In this walkthrough, you will create a .NET 4 windows application that processes all pages of a PDF file while ensuring a controlled memory footprint throughout the task.
This section describes how to create the project for this sample:
This section describes how to create the class that will perform the parsing and recognition of the PDF file using multiple threads:
C# Example |
Copy Code |
---|---|
using System; using System.IO; using System.Threading.Tasks; using ImageGear.Core; using ImageGear.Formats; using ImageGear.Formats.PDF; using ImageGear.Recognition; |
VB.NET Example |
Copy Code |
---|---|
Imports System Imports System.IO Imports System.Threading.Tasks Imports ImageGear.Core Imports ImageGear.Formats Imports ImageGear.Formats.PDF Imports ImageGear.Recognition |
C# Example |
Copy Code |
---|---|
public PageProcessorTest() { //***The SetSolutionName, SetSolutionKey and possibly the SetOEMLicenseKey //methods must be called to distribute the runtime.*** //ImGearLicense.SetSolutionName("YourSolutionName"); //ImGearLicense.SetSolutionKey(12345, 12345, 12345, 12345); //Manually Reported Runtime licenses also require the following method //call to SetOEMLicenseKey. //ImGearLicense.SetOEMLicenseKey("2.0.AStringForOEMLicensing..."); ImGearCommonFormats.Initialize(); ImGearFileFormats.Filters.Insert(0, ImGearPDF.CreatePDFFormat()); ImGearFileFormats.Filters.Insert(0, ImGearPDF.CreatePSFormat()); ImGearPDF.Initialize(); } |
VB.NET Example |
Copy Code |
---|---|
Public Sub Initialize() '***The SetSolutionName, SetSolutionKey and possibly the SetOEMLicenseKey 'methods must be called to distribute the runtime.*** 'ImGearLicense.SetSolutionName("YourSolutionName") 'ImGearLicense.SetSolutionKey(12345, 12345, 12345, 12345) 'Manually Reported Runtime licenses also require the following method 'call to SetOEMLicenseKey. 'ImGearLicense.SetOEMLicenseKey("2.0.AStringForOEMLicensing..."); ImGearCommonFormats.Initialize () ImGearFileFormats.Filters.Insert(0, ImGearPDF.CreatePDFFormat()) ImGearFileFormats.Filters.Insert(0, ImGearPDF.CreatePSFormat()) ImGearPDF.Initialize() End Sub |
C# Example |
Copy Code |
---|---|
private void ProcessPageChunk(ImGearRecPage[] recPagesChunk, ImGearRecDocument document) { Parallel.ForEach(recPagesChunk, pg => { if (pg != null) { pg.Image.Preprocess(); pg.Recognize(); } }); for (int i = 0; i < recPagesChunk.Length; i++) { if (recPagesChunk[i] != null) { document.InsertPage(recPagesChunk[i], -1); // Dispose of the rec page from this last chunk. It has // already been added to an output document at this point // so the recognized page data will be included in the // output document. // Processing the document in "chunks" also allows the // application to maintain a predictable memory footprint. recPagesChunk[i].Dispose(); recPagesChunk[i] = null; } } } |
VB.NET Example |
Copy Code |
---|---|
Private Sub ProcessPageChunk(ByRef recPagesChunk As ImGearRecPage(), ByRef document As ImGearRecDocument) Parallel.ForEach(recPagesChunk, Sub(pg) If pg IsNot Nothing Then Dim recPage As ImGearRecPage = DirectCast(pg, ImGearRecPage) If recPage IsNot Nothing Then recPage.Image.Preprocess() recPage.Recognize() End If End If End Sub) For i As Integer = 0 To recPagesChunk.Length - 1 If recPagesChunk(i) IsNot Nothing Then document.InsertPage(recPagesChunk(i), -1) ' Dispose of the rec page from this last chunk. It has ' already been added to an output document at this point ' so the recognized page data will be included in the ' output document. ' Processing the document in "chunks" also allows the ' application to maintain a predictable memory footprint. recPagesChunk(i).Dispose() recPagesChunk(i) = Nothing End If Next End Sub |
C# Example |
Copy Code |
---|---|
public void Process(FileInfo file) { string xmlFile = "output.xml"; int numberOfCores = 4; using (var igRecognition = new ImGearRecognition()) { using (var document = igRecognition.OutputManager.CreateDocument(null)) { using (var content = new FileStream(file.FullName, FileMode.Open, FileAccess.Read)) { int numberOfPages = ImGearFileFormats.GetPageCount(content, ImGearFormats.UNKNOWN); var recPagesChunk = new ImGearRecPage[numberOfCores]; for (int i = 0; i < numberOfPages; i++) { // Index to track the current index within the smaller // chunk of pages int chunkIndex = i % numberOfCores; ImGearPage igPage = ImGearFileFormats.LoadPage(content, i); // Rasterize the page if it's a vector page if (igPage is ImGearVectorPage) { ImGearPage tempPage = ((ImGearVectorPage)igPage).Rasterize(); if (igPage is IDisposable) { (igPage as IDisposable).Dispose(); } igPage = tempPage; } recPagesChunk[chunkIndex] = igRecognition.ImportPage((ImGearRasterPage)igPage); if (((i != 0) && (chunkIndex == numberOfCores - 1)) || (i == numberOfPages - 1)) { ProcessPageChunk(recPagesChunk, document); } } } igRecognition.OutputManager.CodePage = "UTF-8"; igRecognition.OutputManager.Level = ImGearRecOutputLevel.AUTO; igRecognition.OutputManager.Format = "Converters.Text.XML"; igRecognition.OutputManager.WriteDocument(document, xmlFile); } } } |
VB.NET Example |
Copy Code |
---|---|
Public Sub Process(ByVal file As FileInfo) Dim xmlFile As String = "output.xml" Dim numberOfCores As Integer = 4 Using igRecognition As New ImGearRecognition() Using document As ImGearRecDocument = igRecognition.OutputManager.CreateDocument(Nothing) Using content As New FileStream(file.FullName, FileMode.Open, FileAccess.Read) Dim numberOfPages As Int32 = ImGearFileFormats.GetPageCount(content, ImGearFormats.UNKNOWN) Dim recPagesChunk(numberOfCores) As ImGearRecPage For i As Int32 = 0 To numberOfPages - 1 ' Index to track the current index within the smaller ' chunk of pages Dim chunkIndex As Int32 = i Mod numberOfCores Dim igPage As ImGearPage = ImGearFileFormats.LoadPage(content, i) ' Rasterize the page if it's a vector page If TypeOf igPage Is ImGearVectorPage Then Dim tempPage As ImGearPage = DirectCast(igPage, ImGearVectorPage).Rasterize() If TypeOf igPage Is IDisposable Then DirectCast(igPage, IDisposable).Dispose() End If igPage = tempPage End If recPagesChunk(chunkIndex) = igRecognition.ImportPage(DirectCast(igPage, ImGearRasterPage)) If ((i <> 0 AndAlso chunkIndex = numberOfCores - 1) Or i = numberOfPages - 1) Then ProcessPageChunk(recPagesChunk, document) End If Next End Using igRecognition.OutputManager.CodePage = "UTF-8" igRecognition.OutputManager.Level = ImGearRecOutputLevel.AUTO igRecognition.OutputManager.Format = "Converters.Text.XML" igRecognition.OutputManager.WriteDocument(document, xmlFile) End Using End Using End Sub |
This section describes how to call the PageProcessorTest class that you created above:
C# Example |
Copy Code |
---|---|
PageProcessorTest processor = new PageProcessorTest(); processor.Process(new System.IO.FileInfo(filename)); |
VB.NET Example |
Copy Code |
---|---|
Dim processor As New PageProcessorTest() processor.Initialize() processor.Process(New System.IO.FileInfo(filename)) |