Defining the Character Set

You can improve text recognition accuracy by narrowing the range of characters valid for recognition. This way the recognition engine doesn't always have to choose its solutions from all 500 characters in the recognition engine's Total Character Set. The multi-lingual omnifont MOR recognition module supports all of these characters; other recognition modules recognize fewer of them. Broadly, the Set is compiled as follows:

Language environment
This involves selecting one or more of the 114 available languages with the LanguageEnabled Property and optionally additional characters validated individually with the LanguagesPlus Property. Selecting only needed language(s) has a major impact. For example, selecting German only immediately INVALIDATES the Cyrillic and Greek alphabets and over 150 other unneeded accented letters. For a list of languages with their validated accented letters, see Languages and characters. For a list of accented letters and the languages which use them, see Characters, languages, modules. For an overview of the topic, see Introduction to language-related topics.

If the LanguageEnabled Property is not used to enable or disable any languages, the default is to enable the English alphabet.
Recognition module capabilities
Defining a recognition module for processing a zone may also restrict the available languages or characters within the Language environment.
Filtering
The SDK provides filters (ImGearRecFilter Enumeration) to further narrow down the Character Set, by enabling only certain character classes, e.g., digits, uppercase letters, etc. The value ALL means no filtering.
Re-Expanding
An application may require exceptions to the filter rule. The most flexible way to re-expand the Character Set with individual characters after filtering is to specify them with the FilterPlus Property and validate them with the PLUS flag in the required zones.
Zone-level modification
In addition to the global, page-level definition of the Character Set, the choice of Recognition module, filling method, filtering and use of the expansion string can be fine-tuned on a local, zone level. Auto-located zones will be set to take the global filter settings. Manually created or modified zones may contain zone-level settings. Local filtering and expansion (with PLUS) can be set in the zone's Filter Property. The possible local filter values are the same as the global ones, with an extra one: DEFAULT. If this is the only one set, the zone inherits the global filter setting.

This section provides the following examples to illustrate various techniques for limiting the Character Set:

To summarize, the Character Set for each zone was:

Zones 1-4: Globally defined: All unaccented letters plus all accented letters needed for the three specified languages, plus all digits and punctuation.
Zone 5: Locally defined: All uppercase unaccented letters plus three accented ones, plus the period, comma, and question mark. (The letters BDEFLMRU were doubly validated; that doesn't matter).
Zone 6: Locally defined: The digits, the letters BDEFLMRU, the comma and (superfluously) the period and question mark.

The IsCharEnabled Method can be used to inquire whether a given character is validated for the current page by its Language environment and FilterPlus.