Module name: |
MOR |
Module identifier: |
IG_REC_RM_OMNIFONT_MOR |
Filling methods supported: |
IG_REC_FM_OMNIFONT, IG_REC_FM_DRAFTDOT24, IG_REC_FM_OCRA, IG_REC_FM_OCRB |
Filters supported: |
all filter elements |
Trade-off supported: |
IG_REC_TO_FAST, IG_REC_TO_BALANCED, IG_REC_TO_ACCURATE |
Knowledge base files: |
RECOGN.BCT and RECOGN24.BCT |
Training supported: |
yes |
The PLUS2W and PLUS3W recognition modules also require the presence of this module.
This topic provides information about the following:
This module recognizes machine printed text; i.e., from printed publications, laser or ink-jet printers and electric typewriters. Output from mechanical typewriters in good condition may also be acceptable. It could also be used for letter or near letter quality (NLQ, LQ) output from dot-matrix printers. For Draft quality 24-pin dot-matrix documents use the DRAFTDOT24 filling method. NLQ or LQ quality output can usually be better recognized without using DRAFTDOT24.
The max. number of zones defined on an image that this module can handle is 500.
This module can recognize about 500 characters, termed Engine's Total Character Set. It includes the letters of the Latin, Greek, and Cyrillic alphabets with enough accented letters to recognize the 119 Languages supported by the Engine.
The set is classified as follows:
The characters are listed in category and alphanumeric order, together with their Code Page values, in Characters and Code Pages. These are the character categories used by the filter elements. The pre-trained OCR characters are: OCR Chair, OCR Hook, OCR Fork.
The omnifont recognition module can detect and transmit character attributes: bold, italic, or underlined text (or any combination of them). It can also detect and transmit character size, and can classify font types into three broad categories: serif, sans serif, and monospaced.
The multi-lingual omnifont recognition module basically uses contour analysis, but can supplement this with an innovative form of pattern matching not requiring enormous pre-stored shape libraries.
This module interprets all three page-level recognition trade-off settings: ACCURATE, BALANCED, and FAST.
The module is tightly integrated with the checking module, giving a total of five speed/accuracy choices.