OCR configuration

This is a new OCR option in VirtualViewer. The OCR function allows searching text in an image document (TIFF or PNG initially) as well as selecting text in the VV client after the document has been OCRed.

To OCR a document in the VV client, a user must search for text in a non-text document to get the OCR prompt. The OCRed result is cached; while that result is cached, the user can search for and select text without a further OCR prompt.

The two new parameters necessary to enable OCR are in web.xml:

enableOcr: Enable OCR for searching and text extraction. Must have a valid OCR configuration and licensing to function correctly. Defaults to false.
tesseractDataPath: Absolute or relative path to Tesseract OCR Engine’s training data. If using packed WARs in Tomcat, this needs to be changed to an external unpacked folder. Defaults to “/tessdata”.