The simplest way to extract text from a PDF is using the IG_PDF_text_extract method. It reads in a PDF and writes out the text into a TXT file.
If you want to manipulate the text in memory, then you should use a wordfinder to extract the text.
- Open the PDF document and load it into an HIG_PDF_DOC:
- Create a wordfinder for that PDF:
- Get the number of words on the page so that we can iterate through them:
- Then we iterate through each word:
To learn more about these word objects, you may want to use these two methods:
If you would prefer to access the text through each PDE element, you can do that as well.