The simplest way to extract text from a PDF is using the IG_PDF_text_extract method. It reads in a PDF and writes out the text into a TXT file.
If you want to manipulate the text in memory, then you should use a wordfinder to extract the text.
- Open the PDF document and load it into an HIG_PDF_DOC:
- Create a wordfinder for that PDF:
- Get the number of words on the page so that we can iterate through them:
- Then we iterate through each word:
To learn more about these word objects, you may want to use these two methods:
If you would prefer to access the text through each PDE element, you can do that as well. Refer to the AddNewPageWithImage sample for a demonstration of manipulating PDE elements. For each HIG_PDE_TEXT you can use a variety of methods, such as IG_PDE_text_get_text_unicode.