Selecting and Checking Field Type
SmartZone ICR/OCR allow you to use predefined masking that defines the expected format of your data. If you know your data is expected to match one of these formats, specifying field type using the Reader.FieldType property (ICR: Reader.FieldType; OCR: Reader.FieldType) will make your recognition results more accurate.
The FieldType parameter is used to assist in the recognition of text. SmartZone ICR/OCR will make greater attempts to match the result to the supported format of that field. It is very important that you select the correct field type for an image. If you do not know the format, then GeneralText is the appropriate field type to use.
If the field type determined by SmartZone ICR/OCR matches the field type you specified on input, the same value is returned in the FieldType property of the TextBlockResult. Otherwise, Unknown is returned as the FieldType value, which means the result did not conform to the expected format of the FieldType specified on input. For example, if the Date FieldType is specified, but SmartZone ICR/OCR reads "01~23-47", then the FieldType "Unknown" will be returned.
Supported Field Types
You have the choice of field types including the following:
Language | Description |
---|---|
General Text | ICR: All the supported characters in English, French, Spanish, Italian, German, Dutch, Portuguese, Norwegian, Finnish, Danish, and Swedish. OCR: All the supported characters in Afrikaans, Albanian, Azerbaijani (Cyrillic, Latin), Baltic, Basque, Belarusian, Bosnian (Latin), Bulgarian, Catalan, Central European, Croatian, Cyrillic, Czech, Danish, Dutch, English, Estonian, Faroese, Finnish, French, Frisian, German, Greek, Guarani, Hani, Hungarian, Icelandic, Indonesian, Irish, Italian, Kazakh (Cyrillic, Latin), Kirghiz (Cyrillic), Kirundi, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malay, Norwegian, Polish, Portuguese, Quechua, Rhaeto_Romanic, Romanian, Russian, Rwanda, Serbian (Cyrillic, Latin), Shona, Slovak, Slovenian, Somali, Sorbian, Spanish, Swahili, Swedish, Tajik (Cyrillic), Turkish, Turkmen (Cyrillic, Latin), Ukrainian, Uzbek (Cyrillic, Latin), Western European, Wolof, Xhosa, and Zulu. |
Regular Expression | A regular expression is a pattern in the form of a string that describes or matches the format of expected results, according to certain rules. The advantage of using regular expressions as a field type is to improve recognition results by narrowing the possible answers returned by character recognition in the event of an ambiguity or conflict. A number of SmartZone's built-in field types already use regular expressions. For example, US Social Security Number is a regular expression of the form \d{3}-?\d{2}-?\d{4} See Regular Expressions for more examples and detailed syntax. |
Data Validation Lists | A data validation list is a set of possible expected results. The advantage of using data validation lists as a field type is to improve recognition results by narrowing the possible answers returned by character recognition in the event of an ambiguity or conflict. An example of a data validation list is a list of two character US State abbreviations, from AL to WY. See Define and Edit Data Validation Lists for more information. |
Predefined: | |
Currency | The supported currency symbols, currency punctuation, and digits: $ ¢ £ ¥ € , . ` - = 0123456789. Supported formats include currency symbols in front of the digits, with comma and periods as separator characters and decimal separator. The € symbol may also be placed to the right of the rightmost digit. |
Currency Plus | The supported alphabetic abbreviations for currency symbols, currency punctuation, and digits: USD GBP EUR E DKK Dkr KR NOK Nkr SEK Sk $ ¢ £ ¥ € , . ` - = 0123456789. |
The local-name and the domain name will be evaluated separately, using the @ as the delimiter. Each may use any of these ASCII characters:
| |
Date | We support several diverse date patterns: DD.MM.YYYY DD.MON.YYYY DD/MM/YYYY DD/MON/YYYY DD[/-]MM-YYYY DD[/-]MON-YYYY MM.DD.YYYY MM/DD/YYYY MM[-/.]DD MM[/-]DD-YYYY MON[/-]DD/YYYY YYYY-M-DD YYYY-MM[/-]DD YYYY.MM.DD YYYY.MON.DD YYYY/MM/DD YYYY/MON[/-]DD where
|
Social Security Number | 999-99-9999 99-99999 999.99.9999 99.99999 999999999 9999999 999 99 9999 99 99999 |
Time | H.mm HH.mm H.mm.ss HH.mm.ss hh:mm tt hh:mm:ss tt H:mm HH:mm H:mm:ss HH:mm:ss Where:
|
United States Phone Number | Digits 0-9, ( ), /, EXText Where phone numbers are formatted with or without the 1 and with or without the area code. 1 (999) 999-9999 (999) 999-9999 999-9999 1 (999) 999/9999 999-999-9999 999/999/9999 999-999/9999 Use ext, EXT, X, or x as the extension indicator, follow with two to four digits (the extension number) to the right of it. |
URL | Supported schemes:
Supported extensions include:
Examples: |
For best recognition accuracy results:
- Set the character set to the narrowest set possible that includes all possible returned values
- Indicate the expected formats of recognition results by applying the field types listed here.
Field types are used to improve recognition by defining the number of characters/digits and the formats of expected results, allowing it to choose more wisely from several possible returned values.