Sparse Documents
Sparse documents are a special case of compound documents where very content element represents a single page, and each page/content element is loaded only as needed. The ideal use case for sparse documents is when
- the document is already split up into single-page files and
- retrieving each page is time-expensive, making retrieving the whole document prohibitive.
An example use case could be if a physical document was scanned and converted to a series of large image files, one for each page, which are then stored on a relatively slow storage server - or one that is a far distance away from the PrizmDoc for Java server so that loading each page takes some time.
Sparse documents are not suitable for every case and will not grant performance benefits in most cases. For most document formats and use cases a standard document or a compound document will give the best results.
Loading and displaying Sparse Documents
Similar to compound documents, sparse document page files should be returned in a List in the KEY_DOCUMENT_SPARSE_ELEMENTS parameter in ContentHandlerResponse. Each element of the list can be either a byte array, input stream, or File object. Each document request sent to getDocumentContent() will come with a KEY_SPARSE_PAGE_INDEX parameter in the ContentHandlerInput object. KEY_SPARSE_PAGE_INDEX is a 0-based page index - at least that page needs to be returned. Additional pages around that page can also be returned - PrizmDoc for Java will cache each page as it retrieves it, so returning additional pages will help limit future requests. The KEY_SPARSE_PAGE_COUNT parameter suggests an amount of pages to return, but the suggestion is not required.
Note: if KEY_SPARSE_PAGE_INDEX’s value is null then all sparse page elements must be returned. This is for operations that require the entire document, such as emailing or printing. All page elements should be returned in the same way as a partial sparse document: using KEY_DOCUMENT_SPARSE_ELEMENTS.
When returning a sparse document, four values should be returned:
KEY_DOCUMENT_SPARSE_ELEMENTS: the list of page content elements.KEY_DOCUMENT_SPARSE_PAGE_INDEX: the zero-based page index the returned pages start at. For example, if returning pages 4, 5, and 6, this would be “3” (the zero-based index of page 4).KEY_DOCUMENT_SPARSE_RETURN_COUNT: the number of pages being returned.KEY_DOCUMENT_SPARSE_TOTAL_PAGE_COUNT: the total number of pages in the document.
// This example assumes the existence of a function that returns a single page of the document. To fit the best
// use case of sparse documents, this function should be relatively expensive, so calling it for each page of
// the document is not ideal.
public ContentHandlerResult getDocumentContent(ContentHandlerInput input)
throws VirtualViewerAPIException
{
String clientInstanceId = input.getClientInstanceId();
String documentKey = input.getDocumentId();
Integer requestedPageIndex = input.get(ContentHandlerInput.KEY_SPARSE_PAGE_INDEX);
List<byte[]> sparseElements = new List<byte[]>();
int documentPageCount = getDocumentPageCount(documentKey);
int startPageIndex = 0;
int returnPageCount = documentPageCount;
//If KEY_SPARSE_PAGE_INDEX is null, all pages should be returned. If it is set then we will return the requested
//page and the two next pages.
if(requestedPageIndex != null) {
startPageIndex = requestedPageIndex;
returnPageCount = 3;
//Make sure not to request more pages than the document has
if(startPageIndex + returnPageCount > documentPageCount) {
returnPageCount = documentPageCount - startPageIndex;
}
}
for(int index = startPageIndex; index < startPageIndex + returnPageCount; index++) {
byte[] documentPage = getSingleDocumentPage(documentKey, index);
sparseElements.add(documentPage);
}
ContentHandlerResult result = new ContentHandlerResult();
result.put(ContentHandlerResult.KEY_DOCUMENT_SPARSE_ELEMENTs, sparseElements);
result.put(ContentHandlerResult.KEY_DOCUMENT_SPARSE_PAGE_INDEX, startPageIndex);
result.put(ContentHandlerResult.KEY_DOCUMENT_SPARSE_RETURN_PAGE_COUNT, returnPageCount);
// For this example we are assuming each document is exactly 10 pages long.
// In actual code the real length should be returned.
result.put(ContentHandlerResult.KEY_DOCUMENT_SPARSE_TOTAL_PAGE_COUNT, 10);
return result;
}
Have questions, corrections, or concerns about this topic? Please let us know!