PrizmDoc® v14.3 Preview Release - Updated
PrizmDoc / Developer Guide / Scaling up for Better Performance
In This Topic
    Scaling up for Better Performance
    In This Topic

    Introduction

    Moving to a multi-server environment requires careful planning, configuration, and testing to ensure the environment is scalable and high-performing. You will need to determine the architecture you want to use for the multi-server environment. This can include load balancers, multiple application servers, content servers, and database servers. You will need to consider the needs of your application and the expected traffic to determine the appropriate architecture.

    Benefits of Moving from a Single to Multi-Server Environment

    Moving from a single server to a multi-server environment can offer many benefits, including:

    • Reliability and availability: If one server goes down, the others can continue to provide access to services and applications.
    • Security: Using multiple systems for different resources can help protect against cybersecurity issues.
    • Scalability: Servers can handle more traffic and better align with desired functions.
    • Cost efficiency: A multi-server environment can be more cost-effective for improving performance issues related to resources like RAM or CPU.
    • Disaster recovery: Virtual machines can be quickly and safely moved from one server to another, allowing for data backup at a moment's notice.
    • Efficiency: Sharing servers across network functions and services can reduce the number of servers needed and allow for better resiliency.
    • Environmental and energy savings: Efficient sharing of servers can lead to significant energy savings.

    Requirements / Considerations

    This topic covers the requirements and server sizing considerations for both PrizmDoc Server and PrizmDoc Application Services (PAS):

    PrizmDoc Server Sizing

    This section provides guidance on how to plan a PrizmDoc Server deployment with sufficient hardware to handle content processing needs. The type of content being processed can affect server performance, with PDFs, Microsoft Office formats, and scanned images being the most commonly processed file types. Content factors such as the number of elements, image size, file size, and file types can impact processing and rendering time, requiring more processing for the system to display content. CAD drawings and PDFs using path elements to represent CAD data will require more processing time due to the many elements within the file. Larger image sizes and file sizes will also add processing time, as well as the complexity of certain file types.

    Recommendations for Hardware

    Minimum Moderate High-End
    Logical Cores: 4 Logical Cores: 8 Logical Cores: 16
    Memory: 16GB Memory: 32GB Memory: 64GB
    Hard Drive: SSD Hard Drive: SSD Hard Drive: SSD
    AWS: m5.xlarge AWS: m5.2xlarge AWS: m5.4xlarge

    The throughput expressed below is defining the number of unique documents the system can convert for viewing per minute. The documents included in the recommendation represent a standard mix of Office and PDF files ranging in size from a few pages to a few hundred pages.

    Minimum Moderate High-End
    Windows: 5 per minute Windows: 10 per minute Windows: 15 per minute
    Docker: 10 per minute Docker: 18 per minute Docker: 30 per minute

    NOTE: Every server should have the amount of memory specified above (as opposed to dividing the specified amount of memory across all servers).

    For a complete list of requirements and considerations, review the PrizmDoc Server Sizing topic.

    PAS Server Sizing

    The PrizmDoc Application Services (PAS) can run on a range of server configurations, automatically scaling to use available cores and stream data to reduce RAM usage. Good network throughput is essential between PAS and PrizmDoc Server when hosted on separate servers:

    Minimum Suggested
    Logical Cores: 1 Logical Cores: 2
    Memory: 0.5GB (Docker) / 1GB (Windows) Memory: 4GB (Docker) / 8GB (Windows)
    Hard Drive: SSD Hard Drive: SSD
    AWS: t2.nano (Docker) / t2.micro (Windows) AWS: t2.medium (Docker) / t2.large (Windows)

    For a complete list of requirements and consideration, review the PrizmDoc Application Services (PAS) Server Sizing topic.

    Best Practices

    You'll want to review and familiarize yourself with the following before moving to a multi-server environment:

    • Review the whitepaper, How PrizmDoc Load Balancing Works.
    • If you will be supporting document uploads of 5,000+ pages (and/or at a rate of 40 uploads per minute), we recommend you use a cluster of PrizmDoc Servers to increase the stability of your solution.
    • It is expected for PrizmDoc to consume CPU extensively for a significant time when processing very large documents, or a high volume of documents. We recommend customers use a cluster of PrizmDoc instances that can be scaled up and down when the CPU usage increases or decreases. Please consider using the PrizmDoc Kubernetes solution, since it additionally provides a cluster manager tool for easier maintenance of the cluster.

    Steps to Move to a Multi-Server Environment

    We have two options available:

    1. You can set up PrizmDoc Server or PAS out-of-the-box, or
    2. You can use our Kubernetes guidance and GitHub sample to get started.

    Pro's & Con's for Each Option

    This section lists things to consider for each option before implementation.

    Out-of-the-Box

    • Pro: Does not require Kubernetes infrastructure and knowledge of Kubernetes.
    • Pro: Supports both PrizmDoc Windows and PrizmDoc Docker solutions.
    • Con: Requires additional scripting to keep cluster instances aware of each other.
    • Con: Requires custom solutions for monitoring the load on servers and scaling servers depending on the load.

    Kubernetes

    • Pro: The Cluster Manager component automatically keeps cluster instances aware of each other.
    • Pro: The cluster can be configured to scale automatically when the load increases or decreases.
    • Con: Requires Kubernetes infrastructure and knowledge of Kubernetes.
    • Con: Only supports the PrizmDoc Docker solution.

    NOTE: We recommend the use of an Out-of-the-Box cluster when you need Windows servers (for MSO rendering mode) or when you are planning to use a static cluster with a fixed number of servers. Otherwise, using Kubernetes is a better choice because of the automated cluster management and the auto-scaling support.

    Option 1 - Out-of-the-box

    This section provides an overview of how to run clusters on PrizmDoc Server.

    Overview for PrizmDoc Server

    Moving from a single server to multi-server requires configuring and enabling cluster mode in PrizmDoc Services. This includes setting the network.clustering.enabled parameter to true, and assigning valid port numbers to network.publicPort and network.clustering.clusterPort values, along with network.clustering.servers, which is an optional array of address values corresponding to each PrizmDoc Server on the network node. After installation, stop the PrizmDoc Server and edit the Central Configuration file in a text editor to make these changes. Once cluster mode is enabled and configured, start the PrizmDoc Server on each server in the cluster. Finally, inform the Cloud Entry Point on each PrizmDoc Server of the other available servers in the same network node by sending a HTTP PUT request to each Cloud Entry Point. For detailed instructions, refer to the PrizmDoc Server Clustering topic.

    Optimize Cache Performance for PrizmDoc Server Cluster Environments

    Caching converted content is key to PrizmDoc Server performance. However, in cluster mode, cache data isn't shared across servers, which may result in duplicated efforts to convert the same document. To prevent this, a "hint" value can be provided in a HTTP header to increase the chances of a request for a new viewing session being sent to the same PrizmDoc Server that may have already converted the same document. It's important to use a unique value to identify the document in the hint header. For detailed instructions, refer to the Optimize Cache Performance for Cluster Mode.

    Affinity Tokens & Cluster Mode

    In cluster mode, requests for WorkFile, MarkupBurner, RedactionCreator, and ContentConverter resources require an additional bit of data called an affinity token to ensure they are routed correctly by the Cloud Entry Point. The affinity token is a Base64 encoded string that contains encrypted information necessary to route related requests to the same PrizmDoc Server where the necessary data is cached locally. PrizmDoc Server API automatically generates an affinity token when it receives a POST request for a ViewingSession, WorkFile, MarkupBurner, RedactionCreator, or ContentConverter resource and returns it in the response. Once obtained, the affinity token should be passed in with related requests using the "Accusoft-Affinity-Token" HTTP custom header. For detailed instructions, refer to the Affinity Tokens & Cluster Mode topic.

    NOTE: Affinity tokens are currently still required, however, they have been deprecated and will be removed in a future release.

    Overview for PAS

    This section provides an overview of how to run clusters on PAS.

    Optimize Cache Performance for PAS Cluster Environments

    PrizmDoc Application Services (PAS) can be configured to share cached data among multiple PAS servers. Viewing sessions are cached by either PAS or PrizmDoc Server depending on how the session was created, with sessions created using a documentId stored in PAS' central cache as a pre-converted viewing package. Sessions created without a documentId are cached by PrizmDoc Server as normal. If PrizmDoc Server is running in Cluster Mode, PAS handles the use of affinity hints internally for optimized cache performance. More information on caching strategies can be found in the Implement Caching Strategies topic. For detailed instructions, refer to the Optimize Cache Performance for Cluster Environments topic.

    Run PAS on Clusters

    PrizmDoc Application Services (PAS) is designed to run on multiple machines and can be installed on multiple servers easily. All filesystem-based storage in the PAS configuration file should be configured to point to a shared location, such as a Network Attached Storage (NAS) device, that is accessible to all PAS instances. Each PAS instance should be configured to point to the same PrizmDoc Server or PrizmDoc Cloud entry point. PAS should be re-started after each configuration change for the changes to take effect. Load-balancing several PAS servers can be routed to any instance, and any off-the-shelf load balancer can be used to handle the routing. For detailed instructions, refer to the Run PAS on Clusters topic.

    Option 2 - Kubernetes

    To help you determine the best approach, this section provides overviews for how to deploy PrizmDoc and PAS to Kubernetes. An easy way to get started is with our GitHub sample which lists prerequisites and steps for deploying to a Kubernetes cluster.

    Overview for PrizmDoc Server

    The Resources Used in PrizmDoc Deployment section of the Deployment to Kubernetes Guidance topic provides guidance on how to set up PrizmDoc, a document processing and conversion service, on a Kubernetes cluster.

    Overview for PAS

    The PrizmDoc Application Services section of the Deployment to Kubernetes Guidance topic provides guidance on how to set up PrizmDoc Application Services (PAS) to work on a Kubernetes cluster.