ImageGear Professional v18.2 > API Reference Guide > PDF Component API Reference > PDF Component Objects > General Objects > IGPDFDoc Object > IGPDFDoc Methods > CreateWordFinderUCS Method |
Creates a word finder that is used to extract text in Unicode format from a PDF file.
The word finder also extracts text from Form XObjects that are executed in the page contents. For information about Form XObjects, see Section 4.9 in the PDF Reference. CreateWordFinderUCS is useful for converting non-Roman text (CJK or Chinese-Japanese-Korean) to Unicode. This method also converts Roman text to Unicode in any document. |
CreateWordFinder Method also works for non-Roman character set viewers. For CreateWordFinder Method, words are extracted to the host encoding. Users desiring Unicode output should use CreateWordFinderUCS.
The type of WordFinder determines the encoding of the string returned by IGPDFWord Object.GetString. For instance, if CreateWordFinderUCS is used to create the word finder, IGPDFWord.GetString returns only Unicode.
For CJK viewers, words are stored internally using CID encoding. For more information on CIDFonts and related topics, see Section 5.6 in the PDF Reference. For detailed information on CIDFonts, see Technical Note #5092, CID-Keyed Font Technology Overview, and Technical Note #5014, Adobe CMap and CIDFont Files Specification.
CreateWordFinderUCS (AlgVersion As enumIGPDFWordFinderVersion,
Flags As enumIGPDFWordFlags) As IGPDFWordFinder
Name | Description |
---|---|
AlgVersion | The version of the word-finding algorithm to use. |
Flags | Word-finding options that determine the tables filled when using IGPDFWordFinder.AcquireWordList. Must be an OR of one or more of enumIGPDFWordFlags. |