ImageGear for C and C++ on Windows v19.10 - Updated
Support for Asian Languages
User Guide > How to Work with... > OCR > Concepts > Support for Asian Languages

This section provides information about the following:

Application Areas

The ImageGear Recognition component supports recognition of four Asian languages with horizontal or vertical text direction: Japanese, Korean, and both Traditional and Simplified Chinese. It can handle short embedded texts in English.

Support for Asian languages requires the IG_REC_FEATURE_ASIAN licensing feature to be enabled.

The Asian language handling differs somewhat from that for Western languages. Spell checking, editor display, and verification are not available for Asian languages. Only one Asian language should be set for recognition, and Western languages should not be set alongside an Asian language. However, the Asian OCR Engine can recognize short English texts embedded in Asian text without English needing to be set. If embedded texts are in other Latin-alphabet languages, these similarly do not need to be set; however, accented characters may not always be handled correctly.

For the Asian recognition module to work properly, the selected Asian language should be set before the preprocess operation.

Asian text can be horizontal and left-to-right (IG_REC_WT_FLOW) or vertical - character flow top-to-bottom with line flow from right-to-left (IG_REC_WT_VERTTEXT).

Texts embedded in vertical texts can have three orientations: vertical (neon), right-rotated, and side-by-side. The latter is usually limited to three characters, and is most often used for Arabic numerals. The output converters transform all such embedded texts to right rotation.

Fonts and User Zones

The ideal font point size for Asian language body text is 12 points, scanned at 300 dpi, resulting in characters with around 48 x 48 pixels. The minimum pixel count is about 30 x 30, which is 10.5 points at 300 dpi. For characters smaller than this, 400 dpi should be used.

Character attributes, such as bold and italic styling, cannot be retrieved for Asian text, nor for embedded English text.

When user zones are used, it is recommended that you create homogeneous user zones as much as possible, because they may give better results. It is especially important in the case of Asian languages. IG_REC_WT_AUTO zones can be inhomogeneous.

Deskew and Orientation

The deskew and orientation detection work in a different way than other recognition modules. The working of both operations can be adjusted through functions IG_REC_asian_deskew_enabled_get/IG_REC_asian_deskew_enabled_set and IG_REC_asian_orientation_enabled_get/IG_REC_asian_orientation_enabled_set. If these settings are FALSE, the AUTO methods (IG_REC_IMG_DESKEW_AUTO, IG_REC_IMG_ROTATE_AUTO) of these operations for Asian OCR are equal to the case when they are switched off (IG_REC_IMG_DESKEW_NO, IG_REC_IMG_ROTATE_NO). If the settings are TRUE, or the deskew and orientation are not set to AUTO, the working of these methods is the same for both the Asian and the Western cases.

When using Asian language recognition, it is critical to set the selected Asian language prior to beginning the preprocessing.

Confidence Data and Choices

Recognition results can be saved to memory as an AT_REC_LETTER array, making the confidence data and alternate character choices available for Asian languages for the first time.