ImageGear for C and C++ on Windows v19.1 - Updated
MOR Multi-Lingual Omnifont Recognition Module
User Guide > How to Work with... > OCR > Concepts > Recognition Modules > MOR Multi-Lingual Omnifont Recognition Module

Module name:

MOR

Module identifier:

IG_REC_RM_OMNIFONT_MOR

Filling methods supported:

IG_REC_FM_OMNIFONT, IG_REC_FM_DRAFTDOT24, IG_REC_FM_OCRA, IG_REC_FM_OCRB

Filters supported:

all filter elements

Trade-off supported:

IG_REC_TO_FAST, IG_REC_TO_BALANCED, IG_REC_TO_ACCURATE

Knowledge base files:

RECOGN.BCT and RECOGN24.BCT

Training supported:

yes

The PLUS2W and PLUS3W recognition modules also require the presence of this module.

This topic provides information about the following:

Application Areas

This module recognizes machine printed text; i.e., from printed publications, laser or ink-jet printers and electric typewriters. Output from mechanical typewriters in good condition may also be acceptable. It could also be used for letter or near letter quality (NLQ, LQ) output from dot-matrix printers. For Draft quality 24-pin dot-matrix documents use the DRAFTDOT24 filling method. NLQ or LQ quality output can usually be better recognized without using DRAFTDOT24.

The max. number of zones defined on an image that this module can handle is 500.

Range of Characters

This module can recognize about 500 characters, termed Engine's Total Character Set. It includes the letters of the Latin, Greek, and Cyrillic alphabets with enough accented letters to recognize the 119 Languages supported by the Engine.

The set is classified as follows:

Character Type

Non-Accented

Accented

Latin alphabet upper case letters

26

89

Latin alphabet lower case letters

26

91

Digits

10

Punctuation

29

Miscellaneous (math symbols, etc.)

55

Cyrillic upper case letters

33

14

Cyrillic lower case letters

33

14

Greek upper case letters

24

9

Greek lower case letters

25

11

OCR (OCR-A) characters

3

The characters are listed in category and alphanumeric order, together with their Code Page values, in Characters and Code Pages. These are the character categories used by the filter elements. The pre-trained OCR characters are: OCR Chair, OCR Hook, OCR Fork.

Character Attributes

The omnifont recognition module can detect and transmit character attributes: bold, italic, or underlined text (or any combination of them). It can also detect and transmit character size, and can classify font types into three broad categories: serif, sans serif, and monospaced.

Speed/Accuracy Choices

The multi-lingual omnifont recognition module basically uses contour analysis, but can supplement this with an innovative form of pattern matching not requiring enormous pre-stored shape libraries.

This module interprets all three page-level recognition trade-off settings: ACCURATE, BALANCED, and FAST.

The module is tightly integrated with the checking module, giving a total of five speed/accuracy choices.