User Guide > How to Work with... > Recognition > Recognition Namespace Specifications > Specifications for the Recognition Modules > MOR Multi-Lingual Omnifont Recognition Module |
Module name: |
MOR |
Module identifier: |
OMNIFONT_MOR |
Filling methods supported: |
OMNIFONT, DRAFTDOT24, OCRA, OCRB |
Filters supported: |
all filter elements |
Trade-off supported: |
FAST, BALANCED, ACCURATE |
Knowledge base files: |
RECOGN.BCT and RECOGN24.BCT |
The PLUS2W and PLUS3W recognition modules also require the presence of this module.
Application Areas
This module recognizes machine printed text; i.e., from printed publications, laser or ink-jet printers and electric typewriters. Output from mechanical typewriters in good condition may also be acceptable. It could also be used for letter or near letter quality (NLQ, LQ) output from dot-matrix printers. For Draft quality 24-pin dot-matrix documents use the DRAFTDOT24 filling method. NLQ or LQ quality output can usually be better recognized without using DRAFTDOT24.
The max. number of zones defined on an image that this module can handle is 500.
Range of Characters
This module can recognize about 500 characters, termed Engine’s Total Character Set. It includes the letters of the Latin, Greek and Cyrillic alphabets with enough accented letters to recognize the 119 Languages supported by the Engine
The set is classified as follows:
Non-accented |
Accented | |
Latin alphabet upper case letters |
26 |
89 |
Latin alphabet lower case letters |
26 |
91 |
Digits |
10 |
|
Punctuation |
29 |
|
Miscellaneous (math symbols, etc.) |
55 |
|
Cyrillic upper case letters |
33 |
14 |
Cyrillic lower case letters |
33 |
14 |
Greek upper case letters |
24 |
9 |
Greek lower case letters |
25 |
11 |
OCR (OCR-A) characters |
3 |
The characters are listed in category and alphanumeric order, together with their Code Page values, in Characters and Code Pages. These are the character categories used by the filter elements.
Character Attributes
The omnifont recognition module can detect and transmit character attributes: bold, italic or underlined text (or any combination of them). It can also detect and transmit character size, and can classify font types into three broad categories: serif, sans serif and monospaced.
Speed/Accuracy Choices
The multi-lingual omnifont recognition module basically uses contour analysis, but can supplement this with an innovative form of pattern matching not requiring enormous pre-stored shape libraries.
This module interprets all three page-level recognition trade-off settings: ACCURATE, BALANCED and FAST.
The module is tightly integrated with the checking module, giving a total of five speed/accuracy choices.