Caching in PrizmDoc for Java
PrizmDoc® for Java uses a memory cache (EhCache v3.3.1) to reduce document retrieval time. When PrizmDoc® for Java retrieves a document from the content handler, the document is inserted into the cache to speed up subsequent retrieval.
What is cached?
PrizmDoc® for Java’s most-used cache is the document cache. This holds two different types of objects. The first is a wrapper for document data. The second holds a significantly complex object representing the layout of certain document formats, including Office formats. Each PrizmDoc® for Java instance uses its own cache. To stay synchronized among instances, PrizmDoc® for Java removes a document from the cache when it is modified and saved.
PrizmDoc® for Java also uses two other memory caches, though to a lesser degree:
- PrizmDoc® for Java will cache OCR data: once OCR is complete, it returns a PDF defining positional text data, and this PDF is cached to avoid redoing OCR in the same session.
- PrizmDoc® for Java maintains a cache called the validation cache; if the content handler allows or disallows use of the document cache for a certain document, that response will be stored in the validation cache.
Configuration
The memory cache is configured in the file WEB-INF/ehcache.xml. There are three caches configured in ehcache.xml. The first is the main cache, labelled “vvDocumentCache”; if PrizmDoc® for Java documentation mentions a cache, it is referring to this main document cache. The second is the OCR cache, labelled “vvOcrCache.” This caches OCR data, so OCR does not have to be repeated in a session. Finally, “vvValidationCache” caches responses from the content handler method validateCache.
Each cache is described by an alias attribute in the main cache tag, and two tags called key-type and value-type. None of these values may be changed.
The expiry tag and the resources tag may be modified. By default, the document cache will remove entries that haven’t been used in 60 minutes, and the validation cache will remove entries after 5 minutes; the OCR cache will keep entries without an expiry time.
Within the resources tag, the heap tag configures the maximum size of the cache. In each cache, the heap is described in the unit “entries.” This means that the cache will limit how much it can store based on the count of entries rather than their size. While it is possible to set the units attribute to some memory unit like MB or GB, this is not recommended.
Using a unit other than “entries” will cause ehcache to try to figure out how large an entry is by walking the entire tree of that entry when it is inserted into the cache. This will significantly decrease performance, and will increase memory usage.
Additional configuration can be found in WEB-INF/web.xml, via init-params.
The init-param enableDocumentCache takes a boolean. If this is set to false, the document cache will not be used. It is highly recommended to leave the document cache enabled; disabling the cache will cause significant performance degradation. The document cache should never be disabled if users are viewing document formats that use SnowDoc, like Microsoft Office formats. SnowDoc formats require the document cache for performance optimization. For other format types, however, the document cache could be disabled in favor of another cache solution implemented in the content handler.
The init-param clearCacheOnSave also takes a boolean. If this is set to true, when a user saves a document, the document will be removed from the cache. The document will then be re-requested from the content handler if it needs to be displayed again. This allows the content handler to implement synchronization of user sessions. It is recommended to keep this item set to true.
Server & Client API
Aside from configuration, there are several ways to control cache behavior dynamically.
On the client, the API functions virtualViewer.seedCache(documentId, pages, clientInstanceId) and virtualViewer.removeDocumentFromCache(documentId, clientInstanceId) will respectively add and remove documents from the document cache:
-
seedCachewill retrieve a document from the content handler and add it to the cache. For SnowDoc documents, this may also initiate page layout operations. This function takes two parameters. ThedocumentIdparameter is the document to be added to the cache and is mandatory. Thepagesparameter is optional and only affects Sparse Documents; it would hold an array of page numbers to add to the cache. Finally, theclientInstanceIdparameter is optional, and is a way to directly pass aclientInstanceId, which is a piece of data that will be passed all the way to the content handler. -
removeDocumentFromCachewill manually remove a document from the cache. It takes two parameters, the mandatory ID of the document to remove, and the optionalclientInstanceId.
On the server, implementing the content handler interface CacheValidator allows fine-grained control over which documents are allowed to enter the cache. The interface defines one function, validateCache.
validateCache is called before each document is stored in or retrieved from the PrizmDoc® for Java document cache. It can confirm the operation or prevent it on a document-by-document basis.
The response for each document and operation is cached for a short time in PrizmDoc® for Java to prevent asking about the same operation multiple times in quick succession. In other words, if a specific document is prevented from being cached, PrizmDoc® for Java will not ask again for a few minutes and the document will remain uncached for that time. To modify how long a response from validateCache will be remembered, configure the expiry time attribute of the validation cache in ehcache.xml.
Like all content handler API functions, validateCache takes a ContentHandlerInput object and returns a ContentHandlerResult object.
The ContentHandlerInput object contains the following values:
- The key
KEY_CACHE_ACTIONgets a value of eitherContentHandlerInput.VALUE_CACHE_GETorContentHandlerInput.VALUE_CACHE_PUT, the action to be confirmed for the specified document.GETasks whether the document should be retrieved from the cache, whilePUTasks if it should be stored. - The key
KEY_DOCUMENT_IDstores the ID value that represents the document. This can be retrieved with the codeString documentId = input.getDocumentId(); - The key
KEY_CLIENT_INSTANCE_IDstores a custom configurable value used to pass data from client to content handler. If not set then will be the session ID. This can be retrieved with the codeString clientInstanceId = input.getClientInstanceId(); - The key
KEY_HTTP_SERVLET_REQUESTstores the request that called this method. This can be retrieved with the codeHttpServletRequest request = input.getHttpServletRequest();The returnedContentHandlerResultmust contain one value:- The key
KEY_USE_OF_CACHE_ALLOWEDmust store a boolean value. True allows the operation to continue, and false prevents it. This response will be remembered for a few minutes.
- The key
Have questions, corrections, or concerns about this topic? Please let us know!