ImageGear PDF v25.2 - Updated
Work with OCR Settings
Developer Guide > How to Work with... > OCR > How to... > Work with OCR Settings

In addition to image recognition with default language settings, ImageGear has the ability to specify the language to be recognized on a page. If the language on the recognized page is known prior to recognition, defining the language will make the recognition more precise because the appropriate character sets will be used in the recognition process and dictionaries specific to the language dictionaries will be applied to recognized character constructions.

Use the LanguageEnabled property of ImGearOCRSettings to define a specific language or languages.

The list of languages is separated into a few language groups. The languages from one group may be incompatible with languages from other groups. When the languages from different groups are enabled, recognition may return an error. To avoid using incompatible languages, the following list of language groups should be used:

  1. Greek language.
  2. Latin and Cyrillic language group unites CentralEurope languages, Cyrillic languages, WesternEurope languages, Turkish language and Baltic languages. This set of languages includes: AfrikaansAlbanianAndorraArgentinaAustraliaAustriaAzerbaijanCyrillicAzerbaijanLatinBalticBasqueBelarusianBelgiumBosnian
    BrazilBulgarianCanadaCatalanCentralAmericaCentralEuropeChileColombiaCroatianCyrillicCzechDanishDutchEnglishEstonianFaroese
    FinnishFrenchFrisianGermanGreatBritainGuaraniHaniHungarianIcelandicIndonesianIrishItalianJapanLatinOnlyKazakhCyrillicKazakhLatin
    KirghizCyrillicKirundiLatinLatvianLiechtensteinLithuanianLuxembourgishMacedonianMalayMexicoNetherlandsNewZealandNorwegian
    PolishPortugueseQuechuaRhaetoRomanicRomanianRussianRwandaScandinaviaSerbianCyrillicShonaSlovakSlovenianSomaliSorbian
    SouthAfricaSouthAmericaSpanishSwahiliSwedishSwitzerlandTajikCyrillicTurkishTurkmenCyrillicTurkmenLatinUkrainianUSAUzbekCyrillic
    UzbekLatinVenezuelaWesternEuropeWolofXhosaZulu.
  3. ChineseSimplified and ChineseTraditional languages.
  4. ChineseHongKong language.
  5. Japanese language.
  6. Korean language.
  7. Thai language.

 

You require an ImageGear license that includes support for Asian languages to enable the following:

The following example illustrates how to recognize a page containing only French text.

C#
Copy Code
using System;
using ImageGear.Core;
using ImageGear.OCR;
public static string RecognizeFrenchText(ImGearRasterPage rasterPage)
{
    string resultString = null;

    // Initialization of ImGearOCR by default.
    using (ImGearOCR igOcr = ImGearOCR.Create())
    {
        // Turn off all languages.
        foreach (ImGearOCRLanguage language in Enum.GetValues(typeof(ImGearOCRLanguage)))
                    igOcr.Settings.LanguageEnabled[language] = false;

        // Turn on only French language.
        igOcr.Settings.LanguageEnabled[ImGearOCRLanguage.FRE] = true;

        // Import ImageGear page to recognition repository.
        using (ImGearOCRPage igOcrPage = igOcr.ImportPage(rasterPage))
        {
            igOcrPage.Recognize();
            resultString = igOcrPage.Text;
        }
    }

    return resultString;
 }
VB.NET
Copy Code
Imports System
Imports ImageGear.Core
Imports ImageGear.OCR
Public Shared Function RecognizeFrenchLanguage(ByVal rasterPage As ImGearRasterPage) As String
    Dim resultString As String = Nothing

    Using igOcr As ImGearOCR = ImGearOCR.Create()

        For Each language As ImGearOCRLanguage In [Enum].GetValues(GetType(ImGearOCRLanguage))
                    igOcr.Settings.LanguageEnabled(language) = False
        Next

        igOcr.Settings.LanguageEnabled(ImGearOCRLanguage.FRE) = True

        Using igOcrPage As ImGearOCRPage = igOcr.ImportPage(rasterPage)
            igOcrPage.Recognize()
            resultString = igOcrPage.Text
        End Using
    End Using

    Return resultString
End Function

 

Is this page helpful?
Yes No
Thanks for your feedback.