You can improve text recognition accuracy by narrowing the range of characters valid for recognition. This way the recognition engine doesn't always have to choose its solutions from all 500 characters in the recognition engine's Total Character Set. The multi-lingual omnifont MOR recognition module supports all of these characters; other recognition modules recognize fewer of them. Broadly, the set is compiled as follows:

Language environment
This involves selecting one or more of the 114 available languages with the LanguageEnabled Property and optionally additional characters validated individually with the LanguagesPlus Property. Selecting only needed language(s) has a major impact. For example, selecting German only immediately INVALIDATES the Cyrillic and Greek alphabets and over 150 other unneeded accented letters.

If the LanguageEnabled Property is not used to enable or disable any languages, the default is to enable the English alphabet.
Recognition module capabilities
Defining a recognition module for processing a zone may also restrict the available languages or characters within the Language environment.
Filtering
The SDK provides filters (ImGearRecFilter Enumeration) to further narrow down the Character Set, by enabling only certain character classes, e.g., digits, uppercase letters, etc. The value ALL means no filtering.
Re-Expanding
An application may require exceptions to the filter rule. The most flexible way to re-expand the Character Set with individual characters after filtering is to specify them with the FilterPlus Property and validate them with the PLUS flag in the required zones.
Zone-level modification
In addition to the global, page-level definition of the Character Set, the choice of Recognition module, filling method, filtering and use of the expansion string can be fine-tuned on a local, zone level. Auto-located zones will be set to take the global filter settings. Manually created or modified zones may contain zone-level settings. Local filtering and expansion (with PLUS) can be set in the zone's Filter Property. The possible local filter values are the same as the global ones, with an extra one: DEFAULT. If this is the only one set, the zone inherits the global filter setting.

To summarize, the Character Set for each zone was:

Zones 1-4: Globally defined: All unaccented letters plus all accented letters needed for the three specified languages, plus all digits and punctuation.
Zone 5: Locally defined: All uppercase unaccented letters plus three accented ones, plus the period, comma, and question mark. (The letters BDEFLMRU were doubly validated; that doesn't matter).
Zone 6: Locally defined: The digits, the letters BDEFLMRU, the comma and (superfluously) the period and question mark.

This section provides the following examples to illustrate various techniques for limiting the Character Set:

Recognition of a Bi-Lingual Document
Character Set with No Language Selection
Language Selection, LanguagesPlus Characters, and Local Filter
Multiple Languages, Global and Local Filters, FilterPlus Characters

Recognition of a Bi-Lingual Document

In the following example, the default recognition module (omnifont, unless specifically changed) will be assigned to all zones, as will the filter value DEFAULT, i.e., there is no local modification of the Character Set.

The Asian Recognition module has some unique restrictions in regards to setting multiple languages at one time. See the Asian Recognition Module topic for more details.

CS
VBNET

C#	Copy Code
igRecognition.Recognition.LanguageEnabled.DisableAll(); igRecognition.Recognition.LanguageEnabled[ImGearRecLanguage.GER] = true; igRecognition.Recognition.LanguageEnabled[ImGearRecLanguage.ENG] = true;

Copy Code

igRecognition.Recognition.LanguageEnabled.DisableAll();
igRecognition.Recognition.LanguageEnabled[ImGearRecLanguage.GER] = true;
igRecognition.Recognition.LanguageEnabled[ImGearRecLanguage.ENG] = true;

VB.NET	Copy Code
igRecognition.Recognition.LanguageEnabled.DisableAll() igRecognition.Recognition.LanguageEnabled(ImGearRecLanguage.GER) = True igRecognition.Recognition.LanguageEnabled(ImGearRecLanguage.ENG) = True

Character Set with No Language Selection

You can specify a character set with no language selection: "a, A, b, B, c, C, d, D, e, E" as the only validated characters. This example illustrates a rare case, such as you have a page containing zones with a very restricted number of characters to be recognized, e.g., in recognizing forms or multiple-choice test papers. In this case the application doesn't enable any language, but instead defines the few characters necessary as LanguagesPlus characters. This means there is no language selection and the Language environment consists solely of the individually defined LanguagesPlus characters. Also note that there is no filtering and no locally (zone) validated FilterPlus characters, in this case, the Language environment fully defines the Character Set and it will be valid for the defined zone, and for others inserted with an identical zone structure.

CS
VBNET

C#	Copy Code
igRecognition.Recognition.LanguageEnabled.DisableAll(); igRecognition.Recognition.FilterPlus = "aAbBcCdDeE"; ImGearRecZone igRecZone = new ImGearRecZone(); igRecZone.Rect.CopyFrom(new ImGearRectangle(igRecPage.Image.Width, igRecPage.Image.Height)); igRecZone.FillingMethod = ImGearRecFillingMethod.OMNIFONT; igRecZone.RecognitionModule = ImGearRecRecognitionModule.AUTO; igRecZone.Filter = ImGearRecFilter.ALL \| ImGearRecFilter.PLUS; igRecZone.Type = ImGearRecZoneType.FLOW; igRecPage.Zones.Add(igRecZone);

Copy Code

igRecognition.Recognition.LanguageEnabled.DisableAll();
igRecognition.Recognition.FilterPlus = "aAbBcCdDeE";
ImGearRecZone igRecZone = new ImGearRecZone();
igRecZone.Rect.CopyFrom(new ImGearRectangle(igRecPage.Image.Width, igRecPage.Image.Height));
igRecZone.FillingMethod = ImGearRecFillingMethod.OMNIFONT;
igRecZone.RecognitionModule = ImGearRecRecognitionModule.AUTO;
igRecZone.Filter = ImGearRecFilter.ALL | ImGearRecFilter.PLUS;
igRecZone.Type = ImGearRecZoneType.FLOW;
igRecPage.Zones.Add(igRecZone);

VB .NET	Copy Code
igRecognition.Recognition.LanguageEnabled.DisableAll() igRecognition.Recognition.FilterPlus = "aAbBcCdDeE" Dim igRecZone As New ImGearRecZone() igRecZone.Rect.CopyFrom(New ImGearRectangle(igRecPage.Image.Width, igRecPage.Image.Height)) igRecZone.FillingMethod = ImGearRecFillingMethod.OMNIFONT igRecZone.RecognitionModule = ImGearRecRecognitionModule.AUTO igRecZone.Filter = ImGearRecFilter.ALL Or ImGearRecFilter.PLUS igRecZone.Type = ImGearRecZoneType.FLOW igRecPage.Zones.Add(igRecZone)

VB .NET

Copy Code

igRecognition.Recognition.LanguageEnabled.DisableAll()
igRecognition.Recognition.FilterPlus = "aAbBcCdDeE"
Dim igRecZone As New ImGearRecZone()
igRecZone.Rect.CopyFrom(New ImGearRectangle(igRecPage.Image.Width, igRecPage.Image.Height))
igRecZone.FillingMethod = ImGearRecFillingMethod.OMNIFONT
igRecZone.RecognitionModule = ImGearRecRecognitionModule.AUTO
igRecZone.Filter = ImGearRecFilter.ALL Or ImGearRecFilter.PLUS
igRecZone.Type = ImGearRecZoneType.FLOW
igRecPage.Zones.Add(igRecZone)

In the above case the zone list is not empty, so the Recognize Method will not perform auto-decomposition (auto-zoning), but will act on the inserted zone(s).

Language Selection, LanguagesPlus Characters, and Local Filter

This example demonstrates reading a printed page in Hungarian, in which a Croatian town name appears repeatedly, containing the character "z-hacek" in lower and uppercase. The Windows Eastern Europe Code Page (1250), is needed as the current Code Page (and for export). The page includes a table containing numbers, which should be zoned separately for digits-only recognition.

In this example the Language environment is formed from the language selection (Hungarian) plus the two additional LanguagesPlus characters "z-hacek" and "Z-hacek". There is no global filter, but there is a local one, DIGIT, defined for one zone.

CS
VBNET

C#	Copy Code
igRecognition.Recognition.LanguageEnabled.DisableAll(); igRecognition.Recognition.LanguageEnabled[ImGearRecLanguage.HUN] = true; igRecognition.OutputManager.CodePage = "Windows Eastern"; // Code Page 1250 string s = ""; s += igRecognition.OutputManager.ConvertCodePageToUnicode(0x9E); // z-hacek in CP1250 s += igRecognition.OutputManager.ConvertCodePageToUnicode(0x8E); // Z-hacek in CP1250 igRecognition.Recognition.LanguagesPlus = s; // . . . // 1st zone contains a table with digits. ImGearRecZone igRecZone = new ImGearRecZone(); igRecZone.Rect.Left = 970; igRecZone.Rect.Right = 2260; igRecZone.Rect.Top = 1355; igRecZone.Rect.Bottom = 1729; igRecZone.FillingMethod = ImGearRecFillingMethod.OMNIFONT; igRecZone.RecognitionModule = ImGearRecRecognitionModule.OMNIFONT_MOR; igRecZone.Filter = ImGearRecFilter.DIGIT; igRecZone.Type = ImGearRecZoneType.TABLE; igRecPage.Zones.Add(igRecZone); // 2nd zone contains flowed text without filtering. igRecZone = new ImGearRecZone(); igRecZone.Rect.Left = 342; igRecZone.Rect.Right = 867; igRecZone.Rect.Top = 665; igRecZone.Rect.Bottom = 1644; igRecZone.FillingMethod = ImGearRecFillingMethod.OMNIFONT; igRecZone.RecognitionModule = ImGearRecRecognitionModule.OMNIFONT_MOR; igRecZone.Filter = ImGearRecFilter.ALL; igRecZone.Type = ImGearRecZoneType.FLOW; igRecPage.Zones.Add(igRecZone);

Copy Code

igRecognition.Recognition.LanguageEnabled.DisableAll();
igRecognition.Recognition.LanguageEnabled[ImGearRecLanguage.HUN] = true;
igRecognition.OutputManager.CodePage = "Windows Eastern";  // Code Page 1250
string s = "";
s += igRecognition.OutputManager.ConvertCodePageToUnicode(0x9E); // z-hacek in CP1250
s += igRecognition.OutputManager.ConvertCodePageToUnicode(0x8E); // Z-hacek in CP1250
igRecognition.Recognition.LanguagesPlus = s;
// . . .
// 1st zone contains a table with digits.
ImGearRecZone igRecZone = new ImGearRecZone();
igRecZone.Rect.Left = 970;
igRecZone.Rect.Right = 2260;
igRecZone.Rect.Top = 1355;
igRecZone.Rect.Bottom = 1729;
igRecZone.FillingMethod = ImGearRecFillingMethod.OMNIFONT;
igRecZone.RecognitionModule = ImGearRecRecognitionModule.OMNIFONT_MOR;
igRecZone.Filter = ImGearRecFilter.DIGIT;
igRecZone.Type = ImGearRecZoneType.TABLE;
igRecPage.Zones.Add(igRecZone);
// 2nd zone contains flowed text without filtering.
igRecZone = new ImGearRecZone();
igRecZone.Rect.Left = 342;
igRecZone.Rect.Right = 867;
igRecZone.Rect.Top = 665;
igRecZone.Rect.Bottom = 1644;
igRecZone.FillingMethod = ImGearRecFillingMethod.OMNIFONT;
igRecZone.RecognitionModule = ImGearRecRecognitionModule.OMNIFONT_MOR;
igRecZone.Filter = ImGearRecFilter.ALL;
igRecZone.Type = ImGearRecZoneType.FLOW;
igRecPage.Zones.Add(igRecZone);

VB .NET	Copy Code
igRecognition.Recognition.LanguageEnabled.DisableAll() igRecognition.Recognition.LanguageEnabled(ImGearRecLanguage.HUN) = True igRecognition.OutputManager.CodePage = "Windows Eastern" ' Code Page 1250 Dim s As String = "" s += igRecognition.OutputManager.ConvertCodePageToUnicode(&H9E) ' z-hacek in CP1250 s += igRecognition.OutputManager.ConvertCodePageToUnicode(&H8E) ' Z-hacek in CP1250 igRecognition.Recognition.LanguagesPlus = s ' . . . ' 1st zone contains a table with digits. Dim igRecZone As New ImGearRecZone() igRecZone.Rect.Left = 970 igRecZone.Rect.Right = 2260 igRecZone.Rect.Top = 1355 igRecZone.Rect.Bottom = 1729 igRecZone.FillingMethod = ImGearRecFillingMethod.OMNIFONT igRecZone.RecognitionModule = ImGearRecRecognitionModule.OMNIFONT_MOR igRecZone.Filter = ImGearRecFilter.DIGIT igRecZone.Type = ImGearRecZoneType.TABLE igRecPage.Zones.Add(igRecZone) ' 2nd zone contains flowed text without filtering. igRecZone = New ImGearRecZone() igRecZone.Rect.Left = 342 igRecZone.Rect.Right = 867 igRecZone.Rect.Top = 665 igRecZone.Rect.Bottom = 1644 igRecZone.FillingMethod = ImGearRecFillingMethod.OMNIFONT igRecZone.RecognitionModule = ImGearRecRecognitionModule.OMNIFONT_MOR igRecZone.Filter = ImGearRecFilter.ALL igRecZone.Type = ImGearRecZoneType.FLOW igRecPage.Zones.Add(igRecZone)

VB .NET

Copy Code

igRecognition.Recognition.LanguageEnabled.DisableAll()
igRecognition.Recognition.LanguageEnabled(ImGearRecLanguage.HUN) = True
igRecognition.OutputManager.CodePage = "Windows Eastern"
' Code Page 1250
Dim s As String = ""
s += igRecognition.OutputManager.ConvertCodePageToUnicode(&H9E)
' z-hacek in CP1250
s += igRecognition.OutputManager.ConvertCodePageToUnicode(&H8E)
' Z-hacek in CP1250
igRecognition.Recognition.LanguagesPlus = s
' . . .
' 1st zone contains a table with digits.
Dim igRecZone As New ImGearRecZone()
igRecZone.Rect.Left = 970
igRecZone.Rect.Right = 2260
igRecZone.Rect.Top = 1355
igRecZone.Rect.Bottom = 1729
igRecZone.FillingMethod = ImGearRecFillingMethod.OMNIFONT
igRecZone.RecognitionModule = ImGearRecRecognitionModule.OMNIFONT_MOR
igRecZone.Filter = ImGearRecFilter.DIGIT
igRecZone.Type = ImGearRecZoneType.TABLE
igRecPage.Zones.Add(igRecZone)
' 2nd zone contains flowed text without filtering.
igRecZone = New ImGearRecZone()
igRecZone.Rect.Left = 342
igRecZone.Rect.Right = 867
igRecZone.Rect.Top = 665
igRecZone.Rect.Bottom = 1644
igRecZone.FillingMethod = ImGearRecFillingMethod.OMNIFONT
igRecZone.RecognitionModule = ImGearRecRecognitionModule.OMNIFONT_MOR
igRecZone.Filter = ImGearRecFilter.ALL
igRecZone.Type = ImGearRecZoneType.FLOW
igRecPage.Zones.Add(igRecZone)

Multiple Languages, Global and Local Filters, FilterPlus Characters

To read a page from a Luxembourgian newspaper, in which articles on a single page appear in French, German, and Luxembourgian. The page contains six zones. The text contains no miscellaneous characters (mathematical symbols, etc.). Zone 5 contains uppercase letters only, with no digits, and only three punctuation characters: the comma, the period (full-stop), and a question mark. Zone 6 presents a currency conversion table containing the digits, the comma, and the currency codes of Luxembourg and its neighbors: LUF, FRF, DEM, BEF, and EUR.

The language selection is set to French, German, and Luxembourgian. The DefaultFilter property is used to specify the global filter, to filter out only the 30 miscellaneous characters:

ALPHA | PUNCTUATION | DIGIT.

A local filter is defined for zone 5:

UPPERCASE | PLUS.

A different local filter is defined for zone 6:

DIGIT | PLUS, or simply NUMBERS.

These have the same effect.

The FilterPlus property is set to validate the FilterPlus characters needed in zone 5 and zone 6:

The comma, the period, the question mark, and the currency letters, B D E F L M R U.

CS
VBNET

C#	Copy Code
igRecognition.Recognition.LanguageEnabled.DisableAll(); igRecognition.Recognition.LanguageEnabled[ImGearRecLanguage.FRE] = true; igRecognition.Recognition.LanguageEnabled[ImGearRecLanguage.GER] = true; igRecognition.Recognition.LanguageEnabled[ImGearRecLanguage.LUX] = true; igRecognition.OutputManager.CodePage = "Windows ANSI"; // Code Page 1252 igRecognition.Recognition.DefaultFilter = ImGearRecFilter.ALPHA \| ImGearRecFilter.PUNCTUATION \| ImGearRecFilter.DIGIT; // . . . // 1-4 zones // . . . // 5th zone ImGearRecZone igRecZone = new ImGearRecZone(); igRecZone.Rect.Left = 10; igRecZone.Rect.Right = 330; igRecZone.Rect.Top = 420; igRecZone.Rect.Bottom = 450; igRecZone.FillingMethod = ImGearRecFillingMethod.OMNIFONT; igRecZone.RecognitionModule = ImGearRecRecognitionModule.OMNIFONT_MOR; igRecZone.Filter = ImGearRecFilter.UPPERCASE \| ImGearRecFilter.PLUS; igRecZone.Type = ImGearRecZoneType.FLOW; igRecPage.Zones.Add(igRecZone); // 6th zone igRecZone.Rect.Left = 10; igRecZone.Rect.Right = 330; igRecZone.Rect.Top = 80; igRecZone.Rect.Bottom = 120; igRecZone.FillingMethod = ImGearRecFillingMethod.OMNIFONT; igRecZone.RecognitionModule = ImGearRecRecognitionModule.OMNIFONT_MOR; igRecZone.Filter = ImGearRecFilter.DIGIT \| ImGearRecFilter.PLUS; igRecZone.Type = ImGearRecZoneType.FLOW; igRecPage.Zones.Add(igRecZone); // . . . igRecognition.Recognition.FilterPlus = "BDEFLMRU.,?";

Copy Code

igRecognition.Recognition.LanguageEnabled.DisableAll();
igRecognition.Recognition.LanguageEnabled[ImGearRecLanguage.FRE] = true;
igRecognition.Recognition.LanguageEnabled[ImGearRecLanguage.GER] = true;
igRecognition.Recognition.LanguageEnabled[ImGearRecLanguage.LUX] = true;
igRecognition.OutputManager.CodePage = "Windows ANSI";  // Code Page 1252
igRecognition.Recognition.DefaultFilter = ImGearRecFilter.ALPHA |
     ImGearRecFilter.PUNCTUATION | ImGearRecFilter.DIGIT;
// . . .
// 1-4 zones
// . . .
// 5th zone
ImGearRecZone igRecZone = new ImGearRecZone();
igRecZone.Rect.Left = 10;
igRecZone.Rect.Right = 330;
igRecZone.Rect.Top = 420;
igRecZone.Rect.Bottom = 450;
igRecZone.FillingMethod = ImGearRecFillingMethod.OMNIFONT;
igRecZone.RecognitionModule = ImGearRecRecognitionModule.OMNIFONT_MOR;
igRecZone.Filter = ImGearRecFilter.UPPERCASE | ImGearRecFilter.PLUS;
igRecZone.Type = ImGearRecZoneType.FLOW;
igRecPage.Zones.Add(igRecZone);
// 6th zone
igRecZone.Rect.Left = 10;
igRecZone.Rect.Right = 330;
igRecZone.Rect.Top = 80;
igRecZone.Rect.Bottom = 120;
igRecZone.FillingMethod = ImGearRecFillingMethod.OMNIFONT;
igRecZone.RecognitionModule = ImGearRecRecognitionModule.OMNIFONT_MOR;
igRecZone.Filter = ImGearRecFilter.DIGIT | ImGearRecFilter.PLUS;
igRecZone.Type = ImGearRecZoneType.FLOW;
igRecPage.Zones.Add(igRecZone);
// . . .
igRecognition.Recognition.FilterPlus = "BDEFLMRU.,?";

VB .NET	Copy Code
igRecognition.Recognition.LanguageEnabled.DisableAll() igRecognition.Recognition.LanguageEnabled(ImGearRecLanguage.FRE) = True igRecognition.Recognition.LanguageEnabled(ImGearRecLanguage.GER) = True igRecognition.Recognition.LanguageEnabled(ImGearRecLanguage.LUX) = True igRecognition.OutputManager.CodePage = "Windows ANSI" ' Code Page 1252 igRecognition.Recognition.DefaultFilter = ImGearRecFilter.ALPHA Or ImGearRecFilter.PUNCTUATION Or ImGearRecFilter.DIGIT ' . . . ' 1-4 zones ' . . . ' 5th zone Dim igRecZone As New ImGearRecZone() igRecZone.Rect.Left = 10 igRecZone.Rect.Right = 330 igRecZone.Rect.Top = 420 igRecZone.Rect.Bottom = 450 igRecZone.FillingMethod = ImGearRecFillingMethod.OMNIFONT igRecZone.RecognitionModule = ImGearRecRecognitionModule.OMNIFONT_MOR igRecZone.Filter = ImGearRecFilter.UPPERCASE Or ImGearRecFilter.PLUS igRecZone.Type = ImGearRecZoneType.FLOW igRecPage.Zones.Add(igRecZone) ' 6th zone igRecZone.Rect.Left = 10 igRecZone.Rect.Right = 330 igRecZone.Rect.Top = 80 igRecZone.Rect.Bottom = 120 igRecZone.FillingMethod = ImGearRecFillingMethod.OMNIFONT igRecZone.RecognitionModule = ImGearRecRecognitionModule.OMNIFONT_MOR igRecZone.Filter = ImGearRecFilter.DIGIT Or ImGearRecFilter.PLUS igRecZone.Type = ImGearRecZoneType.FLOW igRecPage.Zones.Add(igRecZone) ' . . . igRecognition.Recognition.FilterPlus = "BDEFLMRU.,?"

VB .NET

Copy Code

igRecognition.Recognition.LanguageEnabled.DisableAll()
igRecognition.Recognition.LanguageEnabled(ImGearRecLanguage.FRE) = True
igRecognition.Recognition.LanguageEnabled(ImGearRecLanguage.GER) = True
igRecognition.Recognition.LanguageEnabled(ImGearRecLanguage.LUX) = True
igRecognition.OutputManager.CodePage = "Windows ANSI"
' Code Page 1252
igRecognition.Recognition.DefaultFilter = ImGearRecFilter.ALPHA Or ImGearRecFilter.PUNCTUATION Or ImGearRecFilter.DIGIT
' . . .
' 1-4 zones
' . . .
' 5th zone
Dim igRecZone As New ImGearRecZone()
igRecZone.Rect.Left = 10
igRecZone.Rect.Right = 330
igRecZone.Rect.Top = 420
igRecZone.Rect.Bottom = 450
igRecZone.FillingMethod = ImGearRecFillingMethod.OMNIFONT
igRecZone.RecognitionModule = ImGearRecRecognitionModule.OMNIFONT_MOR
igRecZone.Filter = ImGearRecFilter.UPPERCASE Or ImGearRecFilter.PLUS
igRecZone.Type = ImGearRecZoneType.FLOW
igRecPage.Zones.Add(igRecZone)
' 6th zone
igRecZone.Rect.Left = 10
igRecZone.Rect.Right = 330
igRecZone.Rect.Top = 80
igRecZone.Rect.Bottom = 120
igRecZone.FillingMethod = ImGearRecFillingMethod.OMNIFONT
igRecZone.RecognitionModule = ImGearRecRecognitionModule.OMNIFONT_MOR
igRecZone.Filter = ImGearRecFilter.DIGIT Or ImGearRecFilter.PLUS
igRecZone.Type = ImGearRecZoneType.FLOW
igRecPage.Zones.Add(igRecZone)
' . . .
igRecognition.Recognition.FilterPlus = "BDEFLMRU.,?"