There are different considerations to be taken into account when you want to improve recognition accuracy. Typically, they also have consequences for the processing speed.
This is one of the most important factors that influences accuracy.
If processing speed and accuracy are also important, consider recognition zone location definition. Use ImGearOCR for working with zones.
Recognition zones may be defined in two ways:
This consists of any combination of the following checking tools:
The Checking subsystem:
An additional way to control the accuracy of recognition and performance algorithms is use of the language selection. Setting the wrong language(s) and/or language dictionary (or leaving unneeded ones enabled) is likely to slow down recognition and reduce accuracy considerably.
The use of the ImGearOCRSettings.LanguageEnabled property allows control of the set of languages that will be used in the recognition process. Only languages corresponding to enabled languages dictionaries will be used.
This determines, at the engine level, which set of characters should be considered as valid. By eliminating characters that are known not to appear in the page, accuracy and performance can be improved.
Use the ImGearOCRSettings.UserCharacterSet property to set the character set to be used on the recognizing page.
If some specific words or phrases are present on the page, the performance and accuracy of the recognition process may be decreased. To avoid such cases the user dictionary may be provided for the recognition process. The user dictionary is a file that contains the set of lines. Each line of this file represents a word or phrase that will be checked for inclusion in the recognized page text.
The interface of the user dictionary is presented in the product as the ImGearOCRDictionary class. The dictionary may be loaded from a file as well as created programmatically. The user dictionary attached to the recognition process ImGearOCRSettings.UserDictionary property is used.