ImageGear v26.0 - Updated
Extract Text from a PDF
Developer Guide > How to Work with ... > PDF > How to... > Manage PDF Content > Extract Text from a PDF

ImageGear support for text extraction includes:

The following example shows how to convert an ImGearDocument into a string using ExtractText:

C#

// Returns a string corresponding to the text extracted from the PDF.
public string ExtractTextFromPDF(ImGearDocument igDocument)
{
    ImGearPDFDocument pdfDocument = igDocument as ImGearPDFDocument;
    using (MemoryStream textFromPDF = new MemoryStream())
    {
        // Extract text from all pages.
        pdfDocument.ExtractText(0, igDocument.Pages.Count, ImGearPDFContextFlags.PDF_ORDER, textFromPDF);
        return System.Text.Encoding.GetEncoding(0).GetString(textFromPDF.ToArray());
    }
}

The options parameter controls how the text is extracted. For example, given the following text in a PDF:

PDF_textextraction

When options = ImGearPDFContextFlags.XY_ORDER, the call to System.Diagnostics.Debug.WriteLine would produce:

The quick brown
fox jumps over
the lazy dog. The green turtle
watched closely
and entertained.

When options = ImGearPDFContextFlags.PDF_ORDER would produce:

The quick brown fox jumps over the lazy dog. The green turtle watched closely and entertained.
Is this page helpful?
Yes No
Thanks for your feedback.