Some configuration is required to enable and use the OCR option.

The two parameters necessary to enable OCR are in web.xml:

  • enableOcr: Enable OCR for searching and text extraction. Must have a valid OCR configuration and licensing to function correctly. Defaults to false.
  • tesseractDataPath: Absolute or relative path to Tesseract OCR Engine’s training data. If using packed WARs in Tomcat, this needs to be changed to an external unpacked folder. Defaults to “/tessdata”.
<init-param>
    <param-name>enableOcr</param-name>
    <param-value>true</param-value>
</init-param>
<init-param>
    <param-name>tesseractDataPath</param-name>
    <param-value>/tessdata</param-value>
</init-param>

This is a new OCR option in PrizmDoc for Java. The OCR function allows searching text in an image document (TIFF or PNG initially) as well as selecting text in the PrizmDoc for Java client after the document has been OCRed.

To OCR a document in the PrizmDoc for Java client, a user must search for text in a non-text document to get the OCR prompt. The OCRed result is cached; while that result is cached, the user can search for and select text without a further OCR prompt.