Introduction
The search context and Personally Identifiable Information (PII) Detectors REST APIs allow your application to detect PII in a document.
A PII detector resource represents an asynchronous PII detection process and yields PII identities as they become available.
Available URLs
URL | Description |
---|---|
POST /v2/piiDetectors | Creates a new PII detector for a search context, starting the process of detecting PII. |
GET /v2/piiDetectors/{processId} | Gets information about a PII detector. |
GET /v2/piiDetectors/{processId}/entities | Gets available PII entities. |
DELETE /v2/piiDetectors/{processId} | Cancels a PII detection process. |
POST /v2/piiDetectors
Creates a new PII detector for a search context, starting the process of detecting PII.
After a successful POST to create the PII detector, we immediately begin a background process to start populating PII entities for you to GET. You do not need to wait for the full set of PII entities to be available; you can start retrieving partial PII entities as soon as they are available. Once the full text of the document has been searched and no more PII entities will be added, the PII detector state will change from "processing"
to "complete"
.
Request
Request Headers
Name | Description |
---|---|
Content-Type |
Must be application/json |
Accusoft-Affinity-Token |
The affinityToken of the search context specified by input.contextId . Required when server clustering is enabled. |
Request Body
input
contextId
(String) Required. Identifies the search context which holds the full-text data to detect PII.
minSecondsAvailable
(Integer) The minimum number of seconds this PII detector will remain available. The actual lifetime may be longer. The default lifetime is defined by theprocessIds.lifetime
central configuration parameter.
Successful Response
Response Body
JSON with metadata about the created PII detector.
input
(Object) Input we accepted to create the PII detector.processId
(String) Unique id for this PII detector.affinityToken
(String) Affinity token for this PII detector. Present when clustering is enabled.state
(String) State of detecting PII."processing"
- The detection is still being executed. Additional PII entities may become available."complete"
- The detection is complete. No additional PII entities will become available."error"
- There was a problem performing the detection. No additional PII entities will become available.
percentComplete
(Integer) Percentage of PII detection which has completed (from0
to100
).expirationDateTime
(String) Currently planned date and time when the PII detector resource will expire and no longer be available for use. Format is RFC 3339 Internet Date/Time profile of ISO 8601, e.g."2023-11-05T08:15:30.494Z"
.
Error Responses
Status Code | JSON errorCode |
Description |
---|---|---|
400 |
"MissingInput" |
Can occur when clustering is enabled and an Accusoft-Affinity-Token request header was not provided. |
480 |
"MissingInput" |
A required input value was not provided. See errorDetails in the response body. |
480 |
"InvalidInput" |
An invalid input value was used. See errorDetails in the response body. |
480 |
"ResourceNotFound" |
Can occur when the search context specified by contextId could not be found. See errorDetails in the response body. |
480 |
"ResourceNotUsable" |
Can occur when the search context specified by contextId is not usable. See errorDetails in the response body. |
480 |
"FeatureNotLicensed" |
You are not licensed to use the PII detection feature. |
580 |
"InternalError" |
The server encountered an internal error when handling the request. |
Example
Request
POST prizmdoc_server_base_url/v2/piiDetectors
Content-Type: application/json
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=
{
"input": {
"contextId": "ElkNzWtrUJp4rXI5YnLUgw"
},
"minSecondsAvailable": 600
}
NOTE: See the Base URL for PAS topic for more information.
Response
HTTP/1.1 200 OK
Content-Type: application/json
{
"input": {
"contextId": "ElkNzWtrUJp4rXI5YnLUgw"
},
"processId": "pR5X6nPDgMwat6cxlmn0Q3",
"state": "processing",
"percentComplete": 0,
"expirationDateTime": "2023-12-17T20:38:39.796Z"
}
GET /v2/piiDetectors/{processId}
Gets information about a PII detector.
To get PII entities, use GET /v2/piiDetectors/{processId}/entities
.
Request
URL Parameters
Parameter | Description |
---|---|
{processId} |
The processId which identifies the PII detector. |
Request Headers
Name | Description |
---|---|
Accusoft-Affinity-Token |
The affinityToken of the PII detector. Required when server clustering is enabled. |
Successful Response
Response Body
JSON with metadata about the PII detector.
input
(Object) Input we used to create the PII detector.processId
(String) Unique id for this PII detector.affinityToken
(String) Affinity token for this PII detector. Present when clustering is enabled.state
(String) State of detecting PII."processing"
- The detection is still being executed. Additional PII entities may become available."complete"
- The detection is complete. No additional PII entities will become available."error"
- There was a problem performing the detection. No additional PII entities will become available.
percentComplete
(Integer) Percentage of PII detection which has completed (from0
to100
).expirationDateTime
(String) Currently planned date and time when the PII detector resource will expire and no longer be available for use. Format is RFC 3339 Internet Date/Time profile of ISO 8601, e.g."2023-11-05T08:15:30.494Z"
.
Error Responses
Status Code | JSON errorCode |
Description |
---|---|---|
404 |
- | No PII detector with the provided {processId} could be found. |
400 |
"MissingInput" |
Can occur when clustering is enabled and an Accusoft-Affinity-Token request header was not provided. |
580 |
"InternalError" |
The server encountered an internal error when handling the request. |
Example
Request
GET prizmdoc_server_base_url/v2/piiDetectors/pR5X6nPDgMwat6cxlmn0Q3
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=
NOTE: See the Base URL for PrizmDoc Server topic for more information.
Response
HTTP/1.1 200 OK
Content-Type: application/json
{
"input": {
"contextId": "ElkNzWtrUJp4rXI5YnLUgw"
},
"processId": "pR5X6nPDgMwat6cxlmn0Q3",
"state": "complete",
"percentComplete": 100,
"expirationDateTime": "2023-12-17T20:38:39.796Z"
}
GET /v2/piiDetectors/{processId}/entities?limit={limit}&continueToken={continueToken}
Gets a block of newly-available PII entities up to a limit.
This URL is designed to give you the PII entities in chunks as they become available. Each GET request will return the currently-known PII entities up to a limit
(default is 100
). If a response contains a continueToken
, it indicates that additional PII entities may be available and that you should issue another GET request using that continueToken
as a query string parameter to skip the PII entities you have already received. As long as a response contains a continueToken
, use it to issue a subsequent GET for more PII entities. When you encounter a response which does not have a continueToken
, you have received all of the PII entities and no more GET requests are necessary.
In order to optimize the number of network requests you make, any response which contains a continueToken
will also contain a continueAfter
value with a recommended number of milliseconds you should wait before sending the next GET request.
Request
URL Parameters
Parameter | Description |
---|---|
{processId} |
The processId which identifies the PII detector. |
{limit} |
The maximum number of PII entities to return for this HTTP request. Must be an integer greater than 0 . Default is 100 . |
{continueToken} |
Used to continue getting PII entities from the point where a previous GET request left off. |
Request Headers
Name | Description |
---|---|
Accusoft-Affinity-Token |
The affinityToken of the PII detector. Required when server clustering is enabled. |
Successful Response
Response Body
JSON with any available PII entities.
entities
(Array of Objects) Always present. Array of newly-available PII entities. If no new PII entities are available, this array will be empty.id
(Integer) Unique number assigned to this PII entity.pageIndex
(Integer) Zero-indexed page number where this PII entity occurs in the document.text
(String) The text of PII entity.lineGroups
(Array of Objects) An array of text line groups that indicate where the PII entity is located.pageIndex
(Integer) Zero-indexed page number where this PII entity occurs in the document.pageData
(Object) Information about the dimensions of the page where this PII entity occurs.width
(Number) Width of the page.height
(Number) Height of the page.
boundingRectangle
(Object) Bounding rectangle dimensions of the PII entity on the first page where it occurs.x
(Number) Distance from the left edge of the page to the left edge of the PII entity bounding box.y
(Number) Distance from the top edge of the page to the top edge of the PII entity bounding box.width
(Number) Width of the PII entity bounding box.height
(Number) Height of the PII entity bounding box.
lines
(Array of Objects) Array of rectangles for each line of the PII entity on the page where it occurs. If the PII entity is on one line, the result is a single array item with a rectangle equal toboundingRectangle
. If the PII entity is on multiple lines, all rectangles in the array will be within the bounds of theboundingRectangle
.x
(Number) Distance from the left edge of the page to the left edge of the PII entity line rectangle.y
(Number) Distance from the top edge of the page to the top edge of the PII entity line rectangle.width
(Number) Width of the PII entity line rectangle.height
(Number) Height of the PII entity line rectangle.
startIndex
(Integer) The index of the full-page text where the PII entity begins (to get the full-page text string, seeGET /v2/searchContexts/{contextId}/records
).type
(string) The entity’s type.score
(Number) A number ranging from 0 to 1 that represents the level of confidence in the accuracy of the detection.
pagesWithoutText
(Array of Integers) Always present. Currently known pages in the document which do not contain any text content at all. Values are zero-indexed page numbers. If the PII detector is still processing (acontinueToken
is present in the response), the data should be considered partial. Note that, unlikePII entities
, this value is cumulative (we always deliver the entire set of pages we know do not contain text data).continueToken
(String) When present, indicates that more PII entities may be available. An additional GET request should be made for more PII entities using this value as thecontinueToken
query string parameter. When not present, indicates that the detection is complete and no further PII entities will be available.continueAfter
(Number) Recommended milliseconds to delay before issuing the next GET request for more PII entities.
Error Responses
Status Code | JSON errorCode |
Description |
---|---|---|
404 |
- | No PII detector with the provided {processId} could be found. |
400 |
"MissingInput" |
Can occur when clustering is enabled and an Accusoft-Affinity-Token request header was not provided. |
480 |
"InvalidInput" |
An invalid input value was used. See errorDetails in the response body. |
480 |
"ResourceNotUsable" |
Can occur when the PII detector is in a state of "error" . You may be able to get more information from a GET /v2/piiDetectors/{processId} . |
580 |
"InternalError" |
The server encountered an internal error when handling the request. |
Example
Here is an example sequence of requests and responses illustrating how you would acquire the full set of PII entities for the PII detector (for brevity, the total number of PII entities in this example is small).
You would start with an initial GET:
GET prizmdoc_server_base_url/v2/piiDetectors/pR5X6nPDgMwat6cxlmn0Q3/entities
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=
NOTE: See the Base URL for PrizmDoc Server topic for more information.
HTTP/1.1 200 OK
Content-Type: application/json
{
"entities": [
{
"id": 0,
"pageIndex": 0,
"text": "John Doe",
"lineGroups": [
"pageIndex": 0,
"boundingRectangle": { "x": 24.20, "y": 13.74, "width": 234.20, "height": 26.10 },
"lines": [{ "x": 24.20, "y": 13.74, "width": 234.20, "height": 26.10 }],
"pageData": { "width": 612, "height": 792 }
],
"startIndex": 19,
"type": "person",
"score": 1.0
},
{
"id": 1,
"pageIndex": 0,
"text": "(555) 555-5555",
"lineGroups": [
"pageIndex": 0,
"boundingRectangle": { "x": 156.07, "y": 352.19, "width": 105.00, "height": 13.41 },
"lines": [{ "x": 156.07, "y": 352.19, "width": 105.00, "height": 13.41 }],
"pageData": { "width": 612, "height": 792 }
],
"startIndex": 527,
"type": "phoneNumber",
"score": 1.0
}
],
"pagesWithoutText": [],
"continueToken": "Cx07GHlkmi32gxAQhv49WZ",
"continueAfter": 500
}
The initial response has given us two PII entities for the first page of the document (page index 0) and a continueToken
which we should use to get more PII entities after waiting 500
milliseconds.
So, half a second later, we issue a follow-up request with the continueToken
passed in as a query string parameter (so we skip over the results we already have):
GET prizmdoc_server_base_url/v2/piiDetectors/pR5X6nPDgMwat6cxlmn0Q3/entities?continueToken=Cx07GHlkmi32gxAQhv49WZ
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=
NOTE: See the Base URL for PrizmDoc Server topic for more information.
HTTP/1.1 200 OK
Content-Type: application/json
{
"entities": [
{
"id": 2,
"pageIndex": 1,
"text": "Jane Doe",
"lineGroups": [
"pageIndex": 0,
"boundingRectangle": { "x": 310.21, "y": 562.14, "width": 254.03, "height": 26.10 },
"lines": [{ "x": 310.21, "y": 562.14, "width": 254.03, "height": 26.10 }],
"pageData": { "width": 612, "height": 792 }
],
"startIndex": 652,
"type": "person",
"score": 1.0
}
],
"pagesWithoutText": [2,3],
"continueToken": "B4uGe7m0ZtxR3lkqA07Nmj",
"continueAfter": 500
}
This time we get back a new PII entity as well as some new information about pagesWithoutText
: we now know that at least page indices 2
and 3
(zero-indexed page numbers) have no text at all.
The presence of a new continueToken
tells us there may be more PII entities, so we submit another request with the new continueToken
:
GET prizmdoc_server_base_url/v2/piiDetectors/pR5X6nPDgMwat6cxlmn0Q3/entities?continueToken=B4uGe7m0ZtxR3lkqA07Nmj
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=
NOTE: See the Base URL for PrizmDoc Server topic for more information.
HTTP/1.1 200 OK
Content-Type: application/json
{
"entities": [
{
"id": 3,
"pageIndex": 5,
"text": "(555) 555-1234",
"lineGroups": [
"pageIndex": 0,
"boundingRectangle": { "x": 67.00, "y": 142.53, "width": 254.03, "height": 26.10 },
"lines": [{ "x": 67.00, "y": 142.53, "width": 254.03, "height": 26.10 }],
"pageData": { "width": 612, "height": 792 }
],
"startIndex": 113,
"type": "phoneNumber",
"score": 1.0
}
],
"pagesWithoutText": [2,3,4]
}
This time we get a new PII entity for page index 5
, and we now know that page indices 2
, 3
, and 4
all contain no text at all. The lack of a continueToken
tells us we have received all of the PII entities, so there are no more GET requests to make.
DELETE /v2/piiDetectors/processId
Cancels the PII detection process. Further requests using this processId
will return errors.
Request
URL Parameters
Parameter | Description |
---|---|
{processId} |
The processId which identifies the PII detector. |
Request Headers
Name | Description |
---|---|
Accusoft-Affinity-Token |
The affinityToken of the PII detector. Required when server clustering is enabled. |
Successful Response
HTTP/1.1 204 No Content
Error Responses
Status Code | JSON errorCode |
Description |
---|---|---|
404 |
- | No PII detector with the provided {processId} could be found. |
400 |
"MissingInput" |
Can occur when clustering is enabled and an Accusoft-Affinity-Token was not provided. |
580 |
"InternalError" |
The server encountered an internal error when handling the request. |