Best Practices for Preprocessing Images and ICR/OCR Recognition
For optimum recognition results when using SmartZone ICR/OCR, use the following steps as image preprocessing and recognition guidelines.
Step 1: Preprocessing
ImagXpress (Included with SmartZone)
Use ImagXpress to clean-up the images being processed. Unless you are processing barcodes for recognition, form recognition results are almost always improved by deskewing and despeckling images, both available in ImagXpress.
ScanFix Xpress (Available for purchase separately)
Although ImagXpress has similar image enhancement options as ScanFix Xpress, we recommend using ScanFix Xpress for enhancements as it is designed primarily for the purpose of cleaning up scanned images.
- The optionally available ScanFix Xpress SDK offers advanced image clean-up features including support for color, grayscale and binary images, color drop out, color noise reduction, auto-binarization, comb removal, border removal, hole punch removal, dot shading removing, and many other options.
- You can use the ScanFix Xpress ReadFromStream and WriteToStream methods to read and write image clean-up instructions.
ScanFix Xpress includes its own help file.
Step 2: Recognition
SmartZone ICR/OCR
- If you haven't previously used ImagXpress to clip a field of interest for recognition, you should now specify and identify your zone of interest. The specified zone is read by the Reader object's (ICR: Reader; OCR: Reader) Area property (ICR: Area; OCR: Area).
- Determine and select which character sets are to be used in the text recognition. Results returned include only characters within your specified character set, and recognition is improved by limiting the character sets to only values you expect to have returned. SmartZone ICR/OCR let you customize character sets by combining character sets provided by Accusoft, and/or omitting characters you don't expect in your data.
- Field type (ICR: FieldType; OCR: FieldType) is required, and its default value is General Text. Change the field type when your data is expected to match any of these predefined formats: date, time, United States phone number, URL, email address, currency, currency plus, social security number, taxpayer ID.
- Recognition results can be improved by using one of the two options for further specifying expected results.
- Write a regular expression (ICR: SetRegularExpression; OCR: SetRegularExpression) to augment off-the-shelf field types, or to create your own masking format.
- Use a data validation list (ICR: DataValidationListAddEntry; OCR: DataValidationListAddEntry) to provide a list of expected data contents, which will be used by the recognition system to choose among possible results.
- Provide any necessary error and exception handling. See Debug Your Application for more information on errors and exceptions.
- Set MinimumCharacterConfidence (ICR: MinimumCharacterConfidence; OCR: MinimumCharacterConfidence), RejectionCharacter (ICR: RejectionCharacter; OCR: RejectionCharacter) and Segmentation (ICR: Segmentation; OCR: Segmentation) properties for the Reader.
- You can use the SmartZone ICR/OCR ReadFromStream (ICR: ReadFromStream; OCR: ReadFromStream) and WriteToStream (ICR: WriteToStream; OCR: WriteToStream) methods to read and write image clean-up instructions and perform further processing.
- Use the AnalyzeField methods (ICR: AnalyzeField; OCR: AnalyzeField) to perform the text recognition.
- Recognition results will be returned in a TextBlockResult (ICR: TextBlockResult; OCR: TextBlockResult) where you can get the text, area, and confidence for the text block, every text line, and character. See Determine Results for more information.
See Also