When examining performance, remember that the most important measures are the speed and cost of the whole system. If the quality setting is increased, this may cause identification to run 10 times slower. But if manual data entry is reduced by half, this may be a very cost-effective solution.
Batch Processing
The FormFix component was designed to perform batch processing. One way that this design demonstrates itself is that the IdentificationProcessor does extra processing during the first identification. This allows it to identify subsequent forms much faster. While this is good for normal operation, it makes identification appear slow if only one form is identified. Exercise care when measuring the performance of identification to ignore the first form. Also, time several runs (preferably over 100 identifications) to reduce the impact of the one-time processing at the start.
Possible steps to improve performance
- The quickest and easiest thing that can be done to improve performance is to reduce IdentificationQuality. However, if that causes an increase in identification failures, it is rarely worth the improvement in performance.
- The content of the images identified has an effect on performance. Large images take longer to identify than small images. High-resolution images take longer than low-resolution images. Images with shaded areas or a lot of noise take longer to identify than other images.
- The number of templates in a set affects the time required to perform identification. FormFix relies on a number of techniques to reduce the impact of a large number of unique forms, but fewer forms will always be faster. Changing the order of templates has no impact on performance.
- Even if an IdentificationProcessor is able to accurately identify all the forms, sometimes an alternate method can be faster using specialized knowledge of specific forms. For example, forms with a bar code may be identified faster by using the bar code. Each page used in the 2000 US Census had a barcode printed on it that uniquely identified the matching template. These forms will still need to run through the identification stage in order to perform the first part of the registration stage. However, IdentificationQuality can be reduced, and the identification search restricted to only the proven single template that matches (for more information, see Enhance Images).