User Guide > How to Work with... > Recognition > Recognition > Define the Character Set |
You can improve text recognition accuracy by narrowing the range of characters valid for recognition. This way the recognition engine doesn't always have to choose its solutions from all 500 characters in the recognition engine's Total Character Set. The multi-lingual omnifont MOR recognition module supports all of these characters; other recognition modules recognize fewer of them. Broadly, the Set is compiled as follows:
This involves selecting one or more of the 114 available languages with the LanguageEnabled Property and optionally additional characters validated individually with the LanguagesPlus Property. Selecting only needed language(s) has a major impact. For example, selecting German only immediately INVALIDATES the Cyrillic and Greek alphabets and over 150 other unneeded accented letters. For a list of languages with their validated accented letters, see Languages and characters. For a list of accented letters and the languages which use them, see Characters, languages, modules. For an overview of the topic, see Introduction to language-related topics.
If the LanguageEnabled Property is not used to enable or disable any languages, the default is to enable the English alphabet. |
Defining a recognition module for processing a zone may also restrict the available languages or characters within the Language environment.
The SDK provides filters (ImGearRecFilter Enumeration) to further narrow down the Character Set, by enabling only certain character classes, e.g., digits, uppercase letters, etc. The value ALL means no filtering.
An application may require exceptions to the filter rule. The most flexible way to re-expand the Character Set with individual characters after filtering is to specify them with the FilterPlus Property and validate them with the PLUS flag in the required zones.
In addition to the global, page-level definition of the Character Set, the choice of Recognition module, filling method, filtering and use of the expansion string can be fine-tuned on a local, zone level. Auto-located zones will be set to take the global filter settings. Manually created or modified zones may contain zone-level settings. Local filtering and expansion (with PLUS) can be set in the zone's Filter Property. The possible local filter values are the same as the global ones, with an extra one: DEFAULT. If this is the only one set, the zone inherits the global filter setting.
This section provides the following examples to illustrate various techniques for limiting the Character Set:
To summarize, the Character Set for each zone was:
The IsCharEnabled Method can be used to inquire whether a given character is validated for the current page by its Language environment and FilterPlus.