ImageGear .NET
Asian Recognition Module

Module name: ASN
Module identifier: ASIAN
Filling methods supported: ASIAN
Filters supported: Not used
Trade-off supported: Not used

The Asian Recognition Module requires the ImGearRecLicenseFeature.AsianOcr license feature to be enabled.

Application Areas

This module provides recognition services for four Asian languages with horizontal or vertical text direction; these languages are Japanese, Korean and Chinese – Traditional and Simplified. It can also recognize short lengths of embedded English text, without explicitly enabling English in the Languages collection.

The Asian language module differs somewhat from those of Western languages. Below is a list of differences that should be taken into account when performing recognition of Asian text:

For the Asian Recognition Module to work correctly, the selected Asian language should be set before performing preprocessing.

Asian text can be horizontal and left-to-right (FLOW) or vertical - character flow top-to-bottom with line flow from right-to-left (VERTTEXT).

Non-Asian texts embedded in vertical texts can have three orientations: vertical (neon), right-rotated and side-by-side. The latter is usually limited to three characters, and is most often used for Arabic numerals. All embedded texts will be converted to right rotation when exported to a formatted output document.

The orientation of Asian text is auto-detected on pages where user zones have not been inserted or on AUTO user zones. Auto-detection runs zone-by-zone, so pages with both horizontal and vertical text blocks (such as for picture captions) can be handled.

Digital camera input can be used for Asian-language input, but the automatic 3D deskewing is not useful is these cases.

Table zones can be inserted into Asian pages, but if the OCR engine cannot detect a table within such a zone, the zone is likely to produce zero recognition results.

Conditions

The ideal font point size for Asian language body text is 12 points, scanned at 300 dpi, resulting in characters with around 48 x 48 pixels. The minimum pixel count is about 30 x 30, that is 10.5 points at 300 dpi. For characters smaller than this, 400 dpi should be used.

When zones are defined by the user, it is recommended to create homogeneous user zones as much as possible, because they may give better results. It is especially important in the case of Asian languages. Zones that are automatically located can be inhomogeneous.

Automatic Deskew and Orientation

Support for images with text in Asian languages by the automatic deskew and orientation process can be turned on or off. By setting the ImGearRecAsianSettings.IgnoreAsianTextForDeskew and  ImGearRecAsianSettings.IgnoreAsianTextForRotation properties to true, when the ImGearRecImage.PreProcess Method is called with DeskewMode and OrientationMode set to AUTO, the image will not be deskewed or rotated if the Asian Recognition module is enabled.

Character Attributes

The character attributes, such as bold and italic styling, cannot be retrieved for Asian text, or for embedded English text.

Confidence Data and Choices

Recognition results can be saved to memory as a LETTER array, making the confidence data and alternate character choices available for Asian languages.

 

 


©2016. Accusoft Corporation. All Rights Reserved.

Send Feedback