Accusoft.FormAssist v5.0 Application - Updated
OCR Property Details
FormAssist5 - User Guide > Getting Started > Create Form Fields > OCR Fields > OCR Property Details

Property Description
Language

The language character sets allow FormAssist to recognize text based on pre-defined European languages. Choose the appropriate language as needed. If a language is not chosen, Western European is the default language that will be used.

Available Language Character Sets include:

Character Set

The SmartZone OCR engine used by FormAssist provides the ability to recognize text based on pre-defined character sets. Choosing the appropriate character set will improve recognition results.

The following character sets are available for use in the drop down within FormAssist OCR property settings in the Properties View:

  •  At the bottom of the Character Set drop down is the option to select Custom...
The Language choice can limit the set of characters from which you can select for a custom character set. To choose from all possible characters, set the Language to “Western European”.

By selecting Custom... and then clicking on the Edit button (which is only displayed when Custom... is selected), a custom character screen displays.

This allows you to customize the characters you want FormAssist to recognize by selecting or unselecting individual characters provided on the panel.

The image below is an example of a Western European language custom character editor screen.

 

Custom Character Screen

Field Type

The expected field type of the text to be recognized.

 

In the middle of the Field Type drop down is the option to select Regular Expression. By selecting this option and then clicking on the Edit button (which is only displayed when Regular Expression is selected), a regular expression editing dialog displays.

This allows you to set a regular expression that SmartZone will apply to the field data after recognition. See the SmartZone OCR help for details about supported regular expression strings.

The image below is an example of a regular expression editing screen.

Minimum Character Confidence This value determines if the character is rejected in the text output. Adjust this value based on the results for better accuracy.
Maximum Blob Size This value determines the maximum size in pixels for a blob to automatically be classified as noise. A blob is an isolated connected group of pixels. By distinguishing blobs from actual characters, recognition accuracy can be improved.
Minimum Text Line Height This value determines the allowable minimum text line height in pixels. This helps prevent noise from being returned as a text line within the result.
Multiple Text Lines Check this setting to analyze multiple text lines. If there is only one text line, better accuracy and speed is achieved if you leave this setting unchecked.
Split Merged Characters Checking this setting determines if cutting of touching characters should be performed during processing. The SmartZone OCR engine used by FormAssist can automatically separate the blobs of touching characters if checked. Keep this setting unchecked if you know that the blobs of the characters do not touch each other for better accuracy and speed.
Split Overlapping Characters for ICR only Check this setting to automatically create segmentation of overlapping blobs into multiple characters. The SmartZone OCR engine used by FormAssist can automatically separate the blobs into a collection of characters. Better speed and accuracy are achieved if you know that there are no overlapping characters and keep this setting unchecked.
Rejection Character This property allows you to control which character is used in the output text when recognition cannot determine a value.
Detect Spaces Check this setting to automatically determine spaces during processing.

 

See Also