PrizmDoc® v14.1 Release - Updated
PrizmDoc / API Reference / PAS REST API / Viewer Support / PII Detectors
In This Topic
    PII Detectors
    In This Topic

    Introduction

    The search context and Personally Identifiable Information (PII) Detectors REST API is used by our Viewer to detect PII in a document which is currently being viewed.

    NOTE: This REST API is designed primarily for our Viewer. If your application needs to perform PII detection without a viewer involved, we recommend you use PrizmDoc Server’s search context and PII detectors REST APIs directly.

    Available URLs

    URL Description
    POST /v2/viewingSessions/{viewingSessionId}/piiDetectors Creates a new PII detector for a viewing session’s source document, starting the process of detecting PII.
    GET /v2/piiDetectors/{processId}/entities Gets available PII entities.
    DELETE /v2/piiDetectors/{processId} Cancels a PII detection process.

    POST /v2/viewingSessions/{viewingSessionId}/piiDetectors

    Creates a new PII detector for a viewing session’s source document, starting the process of detecting PII.

    After a successful POST to create the PII detector, we immediately begin a background process to start populating PII entities for you to GET. You do not need to wait for the full set of PII entities to be available; you can start retrieving partial PII entities as soon as they are available. Once the full text of the document has been searched and no more PII entities will be added, the PII detector state will change from "processing" to "complete".

    Request

    Request Headers

    Name Description
    Content-Type Must be application/json

    Request Body

    • minSecondsAvailable (Integer) The minimum number of seconds this PII detector will remain available. The actual lifetime may be longer. The default lifetime is defined by the processIds.lifetime central configuration parameter.

    Successful Response

    Response Body

    JSON with metadata about the created PII detector.

    • processId (String) Unique id for this PII detector.
    • affinityToken (String) Affinity token for this PII detector. Present when clustering is enabled.
    • state (String) State of detecting PII.
      • "processing" - The detection is still being executed. Additional PII entities may become available.
      • "complete" - The detection is complete. No additional PII entities will become available.
      • "error" - There was a problem performing the detection. No additional PII entities will become available.
    • percentComplete (Integer) Percentage of PII detection which has completed (from 0 to 100).
    • expirationDateTime (String) Currently planned date and time when the PII detector resource will expire and no longer be available for use. Format is RFC 3339 Internet Date/Time profile of ISO 8601, e.g. "2023-11-05T08:15:30.494Z".

    Error Responses

    Status Code JSON errorCode Description
    404 - No viewing session with the provided {viewingSessionId} could be found.
    480 "DocumentNotProvidedYet" The viewing session does not yet have a source document attached.
    480 "ServerContentDisabled" The source document satisfies the client-side viewing formats configured for the viewing session and the server content is disabled.
    480 "InvalidInput" An invalid input value was used. See errorDetails in the response body.
    480 "FeatureDisabled" The viewing session was created with "serverSideSearch" disabled.
    480 "FeatureNotLicensed" You are not licensed to use the PII detection feature.
    580 "InternalError" The server encountered an internal error when handling the request.

    Example

    Request

    This POST begins PII detection:

    POST pas_base_url/v2/viewingSessions/XYZ.../piiDetectors
    Content-Type: application/json
    
    {
      "minSecondsAvailable": 600
    }
    
    

    NOTE: See the Base URL for PAS topic for more information.

    Response

    HTTP/1.1 200 OK
    Content-Type: application/json
    
    {
      "processId": "pR5X6nPDgMwat6cxlmn0Q3",
      "state": "processing",
      "percentComplete": 0,
      "expirationDateTime": "2023-12-17T20:38:39.796Z"
    }
    
    

    GET /v2/piiDetectors/{processId}/entities?limit={limit}&continueToken={continueToken}

    Gets a block of newly-available PII entities up to a limit.

    This URL is designed to give you the PII entities in chunks as they become available. Each GET request will return the currently-known PII entities up to a limit (default is 100). If a response contains a continueToken, it indicates that additional PII entities may be available and that you should issue another GET request using that continueToken as a query string parameter to skip the PII entities you have already received. As long as a response contains a continueToken, use it to issue a subsequent GET for more PII entities. When you encounter a response which does not have a continueToken, you have received all of the PII entities and no more GET requests are necessary.

    In order to optimize the number of network requests you make, any response which contains a continueToken will also contain a continueAfter value with a recommended number of milliseconds you should wait before sending the next GET request.

    Request

    URL Parameters

    Parameter Description
    {processId} The processId which identifies the PII detector.
    {limit} The maximum number of PII entities to return for this HTTP request. Must be an integer greater than 0. Default is 100.
    {continueToken} Used to continue getting PII entities from the point where a previous GET request left off.

    Request Headers

    Name Description
    Accusoft-Affinity-Token The affinityToken of the PII detector. Required when server clustering is enabled.

    Successful Response

    Response Body

    JSON with any available PII entities.

    • entities (Array of Objects) Always present. Array of newly-available PII entities. If no new PII entities are available, this array will be empty.
      • id (Integer) Unique number assigned to this PII entity.
      • pageIndex (Integer) Zero-indexed page number where this PII entity occurs in the document.
      • text (String) The text of PII entity.
      • lineGroups (Array of Objects) An array of text line groups that indicate where the PII entity is located.
        • pageIndex (Integer) Zero-indexed page number where this PII entity occurs in the document.
        • pageData (Object) Information about the dimensions of the page where this PII entity occurs.
          • width (Number) Width of the page.
          • height (Number) Height of the page.
        • boundingRectangle (Object) Bounding rectangle dimensions of the PII entity on the first page where it occurs.
          • x (Number) Distance from the left edge of the page to the left edge of the PII entity bounding box.
          • y (Number) Distance from the top edge of the page to the top edge of the PII entity bounding box.
          • width (Number) Width of the PII entity bounding box.
          • height (Number) Height of the PII entity bounding box.
        • lines (Array of Objects) Array of rectangles for each line of the PII entity on the page where it occurs. If the PII entity is on one line, the result is a single array item with a rectangle equal to boundingRectangle. If the PII entity is on multiple lines, all rectangles in the array will be within the bounds of the boundingRectangle.
          • x (Number) Distance from the left edge of the page to the left edge of the PII entity line rectangle.
          • y (Number) Distance from the top edge of the page to the top edge of the PII entity line rectangle.
          • width (Number) Width of the PII entity line rectangle.
          • height (Number) Height of the PII entity line rectangle.
      • startIndex (Integer) The index of the full-page text where the PII entity begins.
      • type (string) The entity’s type.
      • score (Number) A number ranging from 0 to 1 that represents the level of confidence in the accuracy of the detection.
    • pagesWithoutText (Array of Integers) Always present. Currently known pages in the document which do not contain any text content at all. Values are zero-indexed page numbers. If the PII detector is still processing (a continueToken is present in the response), the data should be considered partial. Note that, unlike PII entities, this value is cumulative (we always deliver the entire set of pages we know to not contain text data).
    • continueToken (String) When present, indicates that more PII entities may be available. An additional GET request should be made for more PII entities using this value as the continueToken query string parameter. When not present, indicates that the detection is complete and no further PII entities will be available.
    • continueAfter (Number) Recommended milliseconds to delay before issuing the next GET request for more PII entities.

    Error Responses

    Status Code JSON errorCode Description
    404 - No PII detector with the provided {processId} could be found.
    400 "MissingInput" Can occur when clustering is enabled and an Accusoft-Affinity-Token request header was not provided.
    480 "InvalidInput" An invalid input value was used. See errorDetails in the response body.
    480 "ResourceNotUsable" Can occur when the PII detector is in a state of "error".
    580 "InternalError" The server encountered an internal error when handling the request.

    Example

    Here is an example sequence of requests and responses illustrating how you would acquire the full set of PII entities for the PII detector (for brevity, the total number of PII entities in this example is small).

    You would start with an initial GET:

    GET pas_base_url/v2/piiDetectors/pR5X6nPDgMwat6cxlmn0Q3/entities
    Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=
    
    

    NOTE: See the Base URL for PAS topic for more information.

    HTTP/1.1 200 OK
    Content-Type: application/json
    
    {
      "entities": [
        {
          "id": 0,
          "pageIndex": 0,
          "text": "John Doe",
          "lineGroups": [
            "pageIndex": 0,
            "boundingRectangle": { "x": 24.20, "y": 13.74, "width": 234.20, "height": 26.10 },
            "lines": [{ "x": 24.20, "y": 13.74, "width": 234.20, "height": 26.10 }],
            "pageData": { "width": 612, "height": 792 }
          ],
          "startIndex": 19,
          "type": "person",
          "score": 1.0
        },
        {
          "id": 1,
          "pageIndex": 0,
          "text": "(555) 555-5555",
          "lineGroups": [
            "pageIndex": 0,
            "boundingRectangle": { "x": 156.07, "y": 352.19, "width": 105.00, "height": 13.41 },
            "lines": [{ "x": 156.07, "y": 352.19, "width": 105.00, "height": 13.41 }],
            "pageData": { "width": 612, "height": 792 }
          ],
          "startIndex": 527,
          "type": "phoneNumber",
          "score": 1.0
        }
      ],
      "pagesWithoutText": [],
      "continueToken": "Cx07GHlkmi32gxAQhv49WZ",
      "continueAfter": 500
    }
    
    

    The initial response has given us two PII entities for the first page of the document (page index 0) and a continueToken which we should use to get more PII entities after waiting 500 milliseconds.

    So, half a second later, we issue a follow-up request with the continueToken passed in as a query string parameter (so we skip over the results we already have):

    GET pas_base_url/v2/piiDetectors/pR5X6nPDgMwat6cxlmn0Q3/entities?continueToken=Cx07GHlkmi32gxAQhv49WZ
    Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=
    
    

    NOTE: See the Base URL for PAS topic for more information.

    HTTP/1.1 200 OK
    Content-Type: application/json
    
    {
      "entities": [
        {
          "id": 2,
          "pageIndex": 1,
          "text": "Jane Doe",
          "lineGroups": [
            "pageIndex": 0,
            "boundingRectangle": { "x": 310.21, "y": 562.14, "width": 254.03, "height": 26.10 },
            "lines": [{ "x": 310.21, "y": 562.14, "width": 254.03, "height": 26.10 }],
            "pageData": { "width": 612, "height": 792 }
          ],
          "startIndex": 652,
          "type": "person",
          "score": 1.0
        }
      ],
      "pagesWithoutText": [2,3],
      "continueToken": "B4uGe7m0ZtxR3lkqA07Nmj",
      "continueAfter": 500
    }
    
    

    This time we get back a new PII entity as well as some new information about pagesWithoutText: we now know that at least page indices 2 and 3 (zero-indexed page numbers) have no text at all.

    The presence of a new continueToken tells us there may be more PII entities, so we submit another request with the new continueToken:

    GET pas_base_url/v2/piiDetectors/pR5X6nPDgMwat6cxlmn0Q3/entities?continueToken=B4uGe7m0ZtxR3lkqA07Nmj
    Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=
    
    

    NOTE: See the Base URL for PAS topic for more information.

    HTTP/1.1 200 OK
    Content-Type: application/json
    
    {
      "entities": [
        {
          "id": 3,
          "pageIndex": 5,
          "text": "(555) 555-1234",
          "lineGroups": [
            "pageIndex": 0,
            "boundingRectangle": { "x": 67.00, "y": 142.53, "width": 254.03, "height": 26.10 },
            "lines": [{ "x": 67.00, "y": 142.53, "width": 254.03, "height": 26.10 }],
            "pageData": { "width": 612, "height": 792 }
          ],
          "startIndex": 113,
          "type": "phoneNumber",
          "score": 1.0
        }
      ],
      "pagesWithoutText": [2,3,4]
    }
    
    

    This time we get a new PII entity for page index 5, and we now know that page indices 2, 3, and 4 all contain no text at all (apparently this was not much of a whitepaper!). The lack of a continueToken tells us we have received all of the PII entities, so there are no more GET requests to make.

    DELETE /v2/piiDetectors/processId

    Cancels the PII detection process. Further requests using this processId will return errors.

    Request

    URL Parameters

    Parameter Description
    {processId} The processId which identifies the PII detector.

    Request Headers

    Name Description
    Accusoft-Affinity-Token The affinityToken of the PII detector. Required when server clustering is enabled.

    Successful Response

    HTTP/1.1 204 No Content
    
    

    Error Responses

    Status Code JSON errorCode Description
    404 - No PII detector with the provided {processId} could be found.
    400 "MissingInput" Can occur when clustering is enabled and an Accusoft-Affinity-Token was not provided.
    580 "InternalError" The server encountered an internal error when handling the request.