Caching in PrizmDoc for Java
PrizmDoc® for Java uses a memory cache (EhCache v3.3.1) to reduce document retrieval time. When PrizmDoc® for Java retrieves a document from the content handler, the document is inserted into the cache to speed up subsequent retrieval.
What is cached?
PrizmDoc® for Java’s most-used cache is the document cache. This holds two different types of objects. The first is a wrapper for document data. The second holds a significantly complex object representing the layout of certain document formats, including Office formats. Each PrizmDoc® for Java instance uses its own cache. To stay synchronized among instances, PrizmDoc® for Java removes a document from the cache when it is modified and saved.
PrizmDoc® for Java also uses two other memory caches, though to a lesser degree:
- PrizmDoc® for Java will cache OCR data: once OCR is complete, it returns a PDF defining positional text data, and this PDF is cached to avoid redoing OCR in the same session.
- PrizmDoc® for Java maintains a cache called the validation cache; if the content handler allows or disallows use of the document cache for a certain document, that response will be stored in the validation cache.
Configuration
The memory cache is configured in the file WEB-INF/ehcache.xml
. There are three caches configured in ehcache.xml
. The first is the main cache, labelled “vvDocumentCache
”; if PrizmDoc® for Java documentation mentions a cache, it is referring to this main document cache. The second is the OCR cache, labelled “vvOcrCache
.” This caches OCR data, so OCR does not have to be repeated in a session. Finally, “vvValidationCache
” caches responses from the content handler method validateCache
.
Each cache is described by an alias attribute in the main cache tag, and two tags called key-type
and value-type
. None of these values may be changed.
The expiry tag and the resources tag may be modified. By default, the document cache will remove entries that haven’t been used in 60 minutes, and the validation cache will remove entries after 5 minutes; the OCR cache will keep entries without an expiry time.
Within the resources tag, the heap tag configures the maximum size of the cache. In each cache, the heap is described in the unit “entries
.” This means that the cache will limit how much it can store based on the count of entries rather than their size. While it is possible to set the units attribute to some memory unit like MB or GB, this is not recommended.
Using a unit other than “entries
” will cause ehcache to try to figure out how large an entry is by walking the entire tree of that entry when it is inserted into the cache. This will significantly decrease performance, and will increase memory usage.
Additional configuration can be found in WEB-INF/web.xml
, via init-params
.
The init-param
enableDocumentCache
takes a boolean. If this is set to false
, the document cache will not be used. It is highly recommended to leave the document cache enabled; disabling the cache will cause significant performance degradation. The document cache should never be disabled if users are viewing document formats that use SnowDoc, like Microsoft Office formats. SnowDoc formats require the document cache for performance optimization. For other format types, however, the document cache could be disabled in favor of another cache solution implemented in the content handler.
The init-param
clearCacheOnSave
also takes a boolean. If this is set to true
, when a user saves a document, the document will be removed from the cache. The document will then be re-requested from the content handler if it needs to be displayed again. This allows the content handler to implement synchronization of user sessions. It is recommended to keep this item set to true.
Server & Client API
Aside from configuration, there are several ways to control cache behavior dynamically.
On the client, the API functions virtualViewer.seedCache(documentId, pages, clientInstanceId)
and virtualViewer.removeDocumentFromCache(documentId, clientInstanceId)
will respectively add and remove documents from the document cache:
-
seedCache
will retrieve a document from the content handler and add it to the cache. For SnowDoc documents, this may also initiate page layout operations. This function takes two parameters. ThedocumentId
parameter is the document to be added to the cache and is mandatory. Thepages
parameter is optional and only affects Sparse Documents; it would hold an array of page numbers to add to the cache. Finally, theclientInstanceId
parameter is optional, and is a way to directly pass aclientInstanceId
, which is a piece of data that will be passed all the way to the content handler. -
removeDocumentFromCache
will manually remove a document from the cache. It takes two parameters, the mandatory ID of the document to remove, and the optionalclientInstanceId
.
On the server, implementing the content handler interface CacheValidator
allows fine-grained control over which documents are allowed to enter the cache. The interface defines one function, validateCache
.
validateCache
is called before each document is stored in or retrieved from the PrizmDoc® for Java document cache. It can confirm the operation or prevent it on a document-by-document basis.
The response for each document and operation is cached for a short time in PrizmDoc® for Java to prevent asking about the same operation multiple times in quick succession. In other words, if a specific document is prevented from being cached, PrizmDoc® for Java will not ask again for a few minutes and the document will remain uncached for that time. To modify how long a response from validateCache
will be remembered, configure the expiry time attribute of the validation cache in ehcache.xml
.
Like all content handler API functions, validateCache
takes a ContentHandlerInput
object and returns a ContentHandlerResult
object.
The ContentHandlerInput
object contains the following values:
- The key
KEY_CACHE_ACTION
gets a value of eitherContentHandlerInput.VALUE_CACHE_GET
orContentHandlerInput.VALUE_CACHE_PUT
, the action to be confirmed for the specified document.GET
asks whether the document should be retrieved from the cache, whilePUT
asks if it should be stored. - The key
KEY_DOCUMENT_ID
stores the ID value that represents the document. This can be retrieved with the codeString documentId = input.getDocumentId();
- The key
KEY_CLIENT_INSTANCE_ID
stores a custom configurable value used to pass data from client to content handler. If not set then will be the session ID. This can be retrieved with the codeString clientInstanceId = input.getClientInstanceId();
- The key
KEY_HTTP_SERVLET_REQUEST
stores the request that called this method. This can be retrieved with the codeHttpServletRequest request = input.getHttpServletRequest();
The returnedContentHandlerResult
must contain one value:- The key
KEY_USE_OF_CACHE_ALLOWED
must store a boolean value. True allows the operation to continue, and false prevents it. This response will be remembered for a few minutes.
- The key
Have questions, corrections, or concerns about this topic? Please let us know!