ImageGear for .NET
UD-Checking

The checking subsystem also makes use of a User dictionary. A User dictionary is a collection of user-specific elements, the so-called UDitems. UDitems can be of two types: literal strings (usually words, as in the case of any word processor's user dictionary) or regular expressions. A string being checked will be accepted if it conforms to at least one item in the specified section of the User dictionary. A regular expression defines a pattern, range, or class of characters, either singly or as a group. When an item is a regular expression, it means that during the UD-checking, strings passed for checking by a recognition module will be checked to see whether they conform to the pattern defined by the regular expression.

In the following example, regular expressions will be applied to check whether the recognized strings comply with post or zip code formats used mostly in Europe or in the US.

Asian Recognition Module: The checking subsystem is not available. This means spell checkingUD-Checking and User-Written Checking cannot be used when the Asian Recognition Module is active. See the Asian Recognition Module topic for more details.

Example: Adding Literal and Regular Expression UDitems to the User Dictionary

C#
Copy Code
string sect1 = "ZIP_Section";
string item_literal = "Accusoft";
// US postal zip code: 12345 or 12345-67890
string US_postal_zip = "\\d{5}(-\\d{5})?";
// European postal code: D-12345 or H-1234
string European_postal_zip = "[A-Z]-\\d{4,5}";
// This assumes the UD is already open for maintenance
igRecognition.Recognition.UserDictionary.Create();
ImGearRecUserDictionary igRecUserDictionary = igRecognition.Recognition.UserDictionary;
igRecUserDictionary.AddItem(new ImGearRecUDItem(sect1, item_literal));
igRecUserDictionary.AddItem(new ImGearRecUDItem(sect1, US_postal_zip, true));
igRecUserDictionary.AddItem(new ImGearRecUDItem(sect1, European_postal_zip, true));
VB .NET
Copy Code
Dim sect1 As String = "ZIP_Section"
Dim item_literal As String = "AccuSoft"
' US postal zip code: 12345 or 12345-67890
Dim US_postal_zip As String = "\d{5}(-\d{5})?"
' European postal code: D-12345 or H-1234
Dim European_postal_zip As String = "[A-Z]-\d{4,5}"
' This assumes the UD is already open for maintenance
igRecognition.Recognition.UserDictionary.Create()
Dim igRecUserDictionary As ImGearRecUserDictionary = igRecognition.Recognition.UserDictionary
igRecUserDictionary.AddItem(New ImGearRecUDItem(sect1, item_literal))
igRecUserDictionary.AddItem(New ImGearRecUDItem(sect1, US_postal_zip, True))
igRecUserDictionary.AddItem(New ImGearRecUDItem(sect1, European_postal_zip, True))

Within the User dictionary, the UDitems can be organized under different sections. Zones are always associated with a section of the User dictionary when they are created.

There can be different situations when it is worth doing an UD-checking.

If the application uses spell checking, and it consistently encounters words marked as uncertain that are spelled correctly, or it is known that the document contains many proper nouns, the application can reduce unwanted marking and improve recognition accuracy by performing UD-checking, to supplement the spell checking (assuming that the User dictionary has been prepared previously by adding the required words to it). In this case the UD-checking is complementary to the spell checking.

UD-checking without spell checking enabled is typically used in form-like applications (e.g., questionnaires), i.e., where the data to be recognized is highly structured and follows predictable patterns.

Specifying the User dictionary file itself is a page-level setting. Once it is specified, it will be applied to all zones on the page. However, since the User dictionary may have several sections, each to be assigned to the different zones, different sets of dictionary items can be used for the different zones. For particular zones the UD-checking can be disabled with the USERDICT_PROHIBIT flag.

 

 


©2014. Accusoft Corporation. All Rights Reserved.

Send Feedback