User Guide > How to Work with... > Recognition > Checking > Improving Accuracy with the Checking Subsystem > UD-Checking |
The checking subsystem also makes use of a User dictionary. A User dictionary is a collection of user-specific elements, the so-called UDitems. UDitems can be of two types: literal strings (usually words, as in the case of any word processor's user dictionary) or regular expressions. A string being checked will be accepted if it conforms to at least one item in the specified section of the User dictionary. A regular expression defines a pattern, range, or class of characters, either singly or as a group. When an item is a regular expression, it means that during the UD-checking, strings passed for checking by a recognition module will be checked to see whether they conform to the pattern defined by the regular expression.
In the following example, regular expressions will be applied to check whether the recognized strings comply with post or zip code formats used mostly in Europe or in the US.
Asian Recognition Module: The checking subsystem is not available. This means spell checking, UD-Checking and User-Written Checking cannot be used when the Asian Recognition Module is active. See the Asian Recognition Module topic for more details. |
C# |
Copy Code |
---|---|
string sect1 = "ZIP_Section"; string item_literal = "Accusoft"; // US postal zip code: 12345 or 12345-67890 string US_postal_zip = "\\d{5}(-\\d{5})?"; // European postal code: D-12345 or H-1234 string European_postal_zip = "[A-Z]-\\d{4,5}"; // This assumes the UD is already open for maintenance igRecognition.Recognition.UserDictionary.Create(); ImGearRecUserDictionary igRecUserDictionary = igRecognition.Recognition.UserDictionary; igRecUserDictionary.AddItem(new ImGearRecUDItem(sect1, item_literal)); igRecUserDictionary.AddItem(new ImGearRecUDItem(sect1, US_postal_zip, true)); igRecUserDictionary.AddItem(new ImGearRecUDItem(sect1, European_postal_zip, true)); |
VB .NET |
Copy Code |
---|---|
Dim sect1 As String = "ZIP_Section" Dim item_literal As String = "AccuSoft" ' US postal zip code: 12345 or 12345-67890 Dim US_postal_zip As String = "\d{5}(-\d{5})?" ' European postal code: D-12345 or H-1234 Dim European_postal_zip As String = "[A-Z]-\d{4,5}" ' This assumes the UD is already open for maintenance igRecognition.Recognition.UserDictionary.Create() Dim igRecUserDictionary As ImGearRecUserDictionary = igRecognition.Recognition.UserDictionary igRecUserDictionary.AddItem(New ImGearRecUDItem(sect1, item_literal)) igRecUserDictionary.AddItem(New ImGearRecUDItem(sect1, US_postal_zip, True)) igRecUserDictionary.AddItem(New ImGearRecUDItem(sect1, European_postal_zip, True)) |
Within the User dictionary, the UDitems can be organized under different sections. Zones are always associated with a section of the User dictionary when they are created.
There can be different situations when it is worth doing an UD-checking.
If the application uses spell checking, and it consistently encounters words marked as uncertain that are spelled correctly, or it is known that the document contains many proper nouns, the application can reduce unwanted marking and improve recognition accuracy by performing UD-checking, to supplement the spell checking (assuming that the User dictionary has been prepared previously by adding the required words to it). In this case the UD-checking is complementary to the spell checking.
UD-checking without spell checking enabled is typically used in form-like applications (e.g., questionnaires), i.e., where the data to be recognized is highly structured and follows predictable patterns.
Specifying the User dictionary file itself is a page-level setting. Once it is specified, it will be applied to all zones on the page. However, since the User dictionary may have several sections, each to be assigned to the different zones, different sets of dictionary items can be used for the different zones. For particular zones the UD-checking can be disabled with the USERDICT_PROHIBIT flag.