Tips for Using FormAssist
Broken Characters
The ScanFix Xpress "Dilate" operation is very useful when characters are be broken, such as with dot matrix or carbon print, or if a light colored pen were used. Sometimes after binarization is applied, some small pieces of a character are missing. Dilate can help "repair" these damaged characters. This could be useful when you have a consistent set of input images.
Character Sets
For OCR and ICR fields, be sure to choose the minimum character set that meets your needs. For instance, in a numeric field, be sure to choose character set "Digits" instead of "Alpha Numeric" or the default, "All Characters". Choosing a more restricted character set will reduce the risks of a false identification.
Detect Spaces
Turn off OCR->Detect Spaces in any field where you do not want spaces to be returned, such as in the zip code fields.
Drawing OCR/ICR Zones
Form Dropout works better if your zones are drawn slightly larger than the boxes. Ensure your zones are constructed so that the complete box is contained, plus a little additional space on all four sides. Note that this does not apply to signatures, which should be within the box and not include the box lines.
For correct OCR/ICR recognition, it is very important to ensure that the entire character area is captured during processing. For example, in this clip of FormAssist's Field Results tab, the 4 and the 5 are truncated at the bottom and are not recognized correctly in the Results area.
To fix this, return to the template and click the Zoom in icon (magnifier with a plus sign) on the toolbar. Expand the field so that the value is completely within the bounds of the box. Although the change may seem very slight, it makes a tremendous difference in recognition accuracy, and can be seen in the new field image.
Drawing OMR Zones
For OMR fields, bring the field coordinates in close to the outside edges of the box. This is a bit different than ICR fields, because, in simple terms, the number of black pixels is being compared to the total number of pixels in the zone (after dropping out the form). Therefore, the larger the defined zone, the smaller the percentage of black pixels. By decreasing the size of the zone, you can increase the "Marked Bubble Threshold" and still get accurate results.
Fields containing Template Content
If you have a field with space(s) or and static character(s) between the filled in values, it is sometimes better to create two separate zones for each part of the field, especially when static content is between the boxes. Form Dropout may leave a little content that may result in an extra character in the results. Ensure you turn off "Detect Spaces" so that your results don't include space characters for the area between the two sets of boxes.
FormAssist Restrictions
FormAssist has two built-in restrictions to prevent you from accidentally running out of memory. These restrictions are not a problem for most users, but some large installations find the need to go beyond them. These restrictions can be changed by simply updating some defined constants in the FormAssist program (which is provided in full source format) and rebuilding FormAssist.
- Number of Form Templates - By default, FormAssist caps the number of form templates at 200. To increase this number, edit the file FormMain.cs and change the value of maxFormsAllowed.
- Number of Fields per Form - By default, FormAssist caps the number of fields per form at 500. To increase this number, edit the file FormMain.cs and change the value of maxFieldsAllowed.
Regular Expressions
For fields whose values follow a particular pattern, try using a regular expression. For example, the Canadian Postal Code has a specific pattern of characters. You want the SmartZone engine to use that pattern when it is making character decisions.
([A-Z])(\d)([A-Z])(\d)([A-Z])(\d)
For the US zip code, specify the following to return only 5 characters.
\d{5}
SmartZone will return a notification if the resulting string does not match the regular expression, so you will know immediately if the clip needs to go to a human. For example, if the US zip code only had four characters, then SmartZone would tell you that the pattern could not be matched.
OMR Field Configuration
The OMR fieldtype is used to model three distinct types of fields:
- the bubble (as seen on school tests), which is expected to be mostly or completely filled
- the checkbox, which is expected to have a much lower level of fill
- the signature block, which expected to have even less filled area.
The following guidelines apply to these three types of OMR Usage:
Bubble
If you are creating an OMR bubble, as is used in educational tests, a Threshold of 50 may be used, since the bubbles are generally more than half marked. An example of OMR bubbles are provided in the "OMR Form Template" sample in the "Assorted Forms" formset.
Checkbox
If you are creating a checkbox, use a much lower threshold value, such as 10. Examples of checkbox fields are shown on the "Direct Deposit Form Template" in the "Assorted Forms" formset.
Signature
If you are creating a signature field, in the OMR settings, make it a "Single Checkbox Field," and set Checkbox Recognition Method to "Shrink Area to Mark Edges". Some people make very small signatures, so you want the density analysis area to be shrunk to only the area containing significant content, not the entire field area. Regular OMR zones for checkboxes should remain without adjustment. Signature areas have to be treated differently. Also, use a much lower Threshold, since the zone is much larger than the marked content; a value of 10 is suggested.
An example of a signature field is shown on the "Direct Deposit Form Template" in the "Assorted Forms" formset.
In the cases above, where the Threshold values are discussed apply to the "Single Threshold" Recognition Engine.
The equivalent parameter for "Dual Threshold" Recognition Engine is called Marked Bubble Threshold.