ImageGear .NET v25.1 - Updated
ImGearPDFWordFinder Constructor(ImGearPDFDocument,Int16[],String[],String[],ImGearPDFWordFinderVersion,ImGearPDFContextFlags)




ImageGear.Formats.Pdf Assembly > ImageGear.Formats.PDF Namespace > ImGearPDFWordFinder Class > ImGearPDFWordFinder Constructor : ImGearPDFWordFinder Constructor(ImGearPDFDocument,Int16[],String[],String[],ImGearPDFWordFinderVersion,ImGearPDFContextFlags)
PDF document to find words in.
Array of 256 flags, specifying the type of character at each position in the encoding. Each flag is an OR of the Character Type Codes. If encodingInfo is Null, the platform's default encoding info is used. Use encodingInfo and encodingVector together; for every encodingInfo use a corresponding encodingVector to specify the character at that position in the encoding.
Array of 256 null-terminated strings that are the glyph names in encoding order. See the discussion of character names in Section 5.3 of the PostScript Language Reference Manual, Third Edition. If encodingVector is Null, the platform's default encoding vector is used. Use this parameter with encodingInfo.
A null-terminated array of null-terminated strings. Each string is the glyph name of a ligature in the font. When a word contains a ligature, the glyph name of the ligature is substituted for the ligature (for example, ff is substituted for the ff ligature). If ligatureTbl is Null, a default ligature table is used, containing the following ligatures: fi, ff, fl, ffi, ffl, ch, cl, ct, ll, ss, fs, st, oe, OE.
The version of the word-finding algorithm to use.
Word-finding options that determine the tables filled when using AcquireWordList. Must be an OR of one or more of ImGearPDFContextFlags.
Initializes a new instance of the ImGearPDFWordFinder class.
Syntax
'Declaration
 
Public Function New( _
   ByVal document As ImGearPDFDocument, _
   ByVal encodingInfo() As Short, _
   ByVal encodingVector() As String, _
   ByVal ligatureTable() As String, _
   ByVal algorithmVersion As ImGearPDFWordFinderVersion, _
   ByVal options As ImGearPDFContextFlags _
)
'Usage
 
Dim document As ImGearPDFDocument
Dim encodingInfo() As Short
Dim encodingVector() As String
Dim ligatureTable() As String
Dim algorithmVersion As ImGearPDFWordFinderVersion
Dim options As ImGearPDFContextFlags
 
Dim instance As New ImGearPDFWordFinder(document, encodingInfo, encodingVector, ligatureTable, algorithmVersion, options)

Parameters

document
PDF document to find words in.
encodingInfo
Array of 256 flags, specifying the type of character at each position in the encoding. Each flag is an OR of the Character Type Codes. If encodingInfo is Null, the platform's default encoding info is used. Use encodingInfo and encodingVector together; for every encodingInfo use a corresponding encodingVector to specify the character at that position in the encoding.
encodingVector
Array of 256 null-terminated strings that are the glyph names in encoding order. See the discussion of character names in Section 5.3 of the PostScript Language Reference Manual, Third Edition. If encodingVector is Null, the platform's default encoding vector is used. Use this parameter with encodingInfo.
ligatureTable
A null-terminated array of null-terminated strings. Each string is the glyph name of a ligature in the font. When a word contains a ligature, the glyph name of the ligature is substituted for the ligature (for example, ff is substituted for the ff ligature). If ligatureTbl is Null, a default ligature table is used, containing the following ligatures: fi, ff, fl, ffi, ffl, ch, cl, ct, ll, ss, fs, st, oe, OE.
algorithmVersion
The version of the word-finding algorithm to use.
options
Word-finding options that determine the tables filled when using AcquireWordList. Must be an OR of one or more of ImGearPDFContextFlags.

Return Value

The new instance of ImGearPDFWordFinder class object.
Remarks
The word finder that is used to extract text in the host encoding from a PDF file. It also extracts text from Form XObjects that are executed in the page contents. For information about Form XObjects, see Section 4.9 in the PDF Reference.

This method also works for non-Roman (CJK or Chinese-Japanese-Korean) viewers. In this case, words are extracted to the host encoding.

The type of WordFinder determines the encoding of the string returned by ImGearPDFWord.String.

For CJK viewers, words are stored internally using CID encoding. For more information on CIDFonts and related topics, see Section 5.6 in the PDF Reference. For detailed information on CIDFonts, see Technical Note #5092, CID-Keyed Font Technology Overview, and Technical Note #5014, Adobe CMap and CIDFont Files Specification.

See Also

Reference

ImGearPDFWordFinder Class
ImGearPDFWordFinder Members
Overload List
ImGearPDFDictionary Class
ImGearPDFWordFinderVersion Enumeration
ImGearPDFContextFlags Enumeration