ImageGear for C and C++ on Windows v19.1 - Updated
FRX Multi-Lingual Omnifont Recognition Module
User Guide > How to Work with... > OCR > Concepts > Recognition Modules > FRX Multi-Lingual Omnifont Recognition Module

Module name:

FRX

Module identifier:

IG_REC_RM_FRX

Filling methods supported:

IG_REC_FM_OMNIFONT

Filters supported:

all filter elements

Trade-off supported:

none

Knowledge base files:

none

Training supported:

yes

The OMNIFONT_PLUS2W, and OMNIFONT_PLUS3W recognition modules require the presence of this module.

Its associated files are:

baltic.shp

Frx shape pack (code page) file.

cyrillic.shp

Frx shape pack (code page) file.

greek.shp

Frx shape pack (code page) file.

latin1.shp

Frx shape pack (code page) file.

latin2.shp

Frx shape pack (code page) file.

turkish.shp

Frx shape pack (code page) file.

charsettable.chr

 

asciieng.lng

Frx language dictionary. Used in case of multi-language selection.

czech.lng

Frx language dictionary data file.

danish.lng

Frx language dictionary data file.

dutch.lng

Frx language dictionary data file.

english.lng

Frx language dictionary data file.

finnish.lng

Frx language dictionary data file.

french.lng

Frx language dictionary data file.

german.lng

Frx language dictionary data file.

greek.lng

Frx language dictionary data file.

hungar.lng

Frx language dictionary data file.

italian.lng

Frx language dictionary data file.

norsk.lng

Frx language dictionary data file.

polish.lng

Frx language dictionary data file.

port.lng

Frx language dictionary data file.

russian.lng

Frx language dictionary data file.

spanish.lng

Frx language dictionary data file.

swedish.lng

Frx language dictionary data file.

turkish.lng

Frx language dictionary data file.

This section provides information about the following:

Application Areas

This module recognizes machine printed text; i.e., from printed publications, laser or ink-jet printers, and electric typewriters. Output from mechanical typewriters in good condition may also be acceptable. It should also be used for letter or near letter quality (NLQ, LQ) output from dot-matrix printers.

Range of Characters

This module supports the recognition of Latin, Greek, and Cyrillic alphabets with enough accented letters to recognize the 54 languages.

The characters are listed in category and alphanumeric order, together with their Code Page values, in Characters and Code Pages.

Multi-Lingual Language Support

The language support of this module is based on the module's internal code pages, which contain characters from a related group of languages. The internal code pages of this module are American/European (Latin 1, 1252), Baltic (1257), Central-European (Latin 2, 1250), Cyrillic (1251), Greek (1253), and Turkish (1254).

The module supports multi-language selection for recognition, though it may not recognize languages from different language groups properly. It supports only language combinations within the same Code Page. For example, it properly processes the English, German, and Italian language combination, since all these languages belong to the Latin 1 (1252) code page. However, when specifying both the French and Czech languages, for example, OMNIFONT_FRX may fail to properly recognize some accented characters in the Czech alphabet, since these languages are not in the same code page. The following table contains the languages by code pages supported by FRX.

Latin 2 (1250)

Polish, Czech, Hungarian, Romanian, Albanian, Croatian, Wend (Sorbian), Slovak, Slovenian

Cyrillic (1251)

Russian, Ukrainian, Byelorussian, Bulgarian, Macedonian, Serbian

Latin 1 (1252)

English, German, French, Spanish, Italian, Dutch, Swedish, Norwegian, Finnish, Danish, Portuguese, Portuguese (Brazilian), Catalan, Afrikaans, Aymara, Basque, Breton, Faroese, Friulian, Gaelic, Galician, Eskimo, Icelandic, Indonesian, Latin, Malaysian, Pidgin English, Swahili, Tahitian, Welsh, Frisian, Zulu

Greek (1253)

Greek

Turkish (1254)

Turkish, Kurdish (written in Latin alphabet)

Baltic (1257)

Estonian, Hawaiian, Latvian, Lithuanian

Character Attributes

The omnifont recognition module can detect and transmit character attributes: bold, italic, or underlined text (or any combination of them). It can also detect and transmit character size, and can classify font types into three broad categories: serif, sans serif, and monospaced.