PrizmDoc Viewer v13.8 - Updated
Search Tasks
API Reference > PrizmDoc Server API > Search Tasks

Introduction

The search context and search task APIs are designed for a viewer to perform server-side searching and text retrieval of a document.

A search task represents an asynchronous full-text search of a document (via a search context) and yields results as they become available.

Available URLs

URL Description
POST /v2/searchTasks Starts an asynchronous full-text search against a search context.
POST /v2/viewingSessions/{viewingSessionId}/searchTasks Starts an asynchronous full-text search against a viewing session's source document.
GET /v2/searchTasks/{processId} Gets information about a search task.
GET /v2/searchTasks/{processId}/results Gets available search results.
DELETE /v2/searchTasks/{processId} Cancels a search task.

POST /v2/searchTasks

Starts an asynchronous full-text search against a search context.

After a successful POST to create the search task, we immediately begin a background process to start populating search results for you to GET. You do not need to wait for the full set of results to be available; you can start retrieving partial search results as soon as they are available. Once the full text of the document has been searched and no more results will be added, the search task state will change from "processing" to "complete".

Request

Request Headers

Name Description
Content-Type Must be application/json
Accusoft-Affinity-Token The affinityToken of the search context specified by input.contextId. Required when server clustering is enabled.

Request Body

Successful Response

Response Body

JSON with metadata about the created search task.

Error Responses

Status Code JSON errorCode Description
400 "MissingInput" Can occur when clustering is enabled and an Accusoft-Affinity-Token request header was not provided.
480 "MissingInput" A required input value was not provided. See errorDetails in the response body.
480 "InvalidInput" An invalid input value was used. See errorDetails in the response body.
480 "MissingInputForSimpleTerm" An invalid input value was used in a "simple" term object. See errorDetails in the response body.
480 "InvalidInputForSimpleTerm" An invalid input value was used in a "simple" term object. See errorDetails in the response body.
480 "MissingInputForProximityTerm" An invalid input value was used in a "proximity" term object. See errorDetails in the response body.
480 "InvalidInputForProximityTerm" An invalid input value was used in a "proximity" term object. See errorDetails in the response body.
480 "ResourceNotFound" Can occur when the search context specified by contextId could not be found. See errorDetails in the response body.
480 "ResourceNotUsable" Can occur when the search context specified by contextId is not usable. See errorDetails in the response body.
580 "InternalError" The server encountered an internal error when handling the request.

Example

Request

POST prizmdoc_server_base_url/v2/searchTasks
Content-Type: application/json
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=

{
  "input": {
    "contextId": "ElkNzWtrUJp4rXI5YnLUgw",
    "searchTerms": [{
      "type": "simple",
      "pattern": "quick"
    }]
  }
}

Response

HTTP/1.1 200 OK
Content-Type: application/json

{
  "input": {
    "contextId": "ElkNzWtrUJp4rXI5YnLUgw",
    "searchTerms": [{
      "type": "simple",
      "pattern": "quick",
      "caseSensitive": false,
      "contextPadding": 25
    }]
  },
  "processId": "pR5X6nPDgMwat6cxlmn0Q3",
  "state": "processing",
  "percentComplete": 0,
  "expirationDateTime": "2016-12-17T20:38:39.796Z"
}

Additional Examples

For more examples of how to construct different searches, see Example Searches.

POST /v2/viewingSessions/{viewingSessionId}/searchTasks

Starts an asynchronous full-text search against a viewing session's source document.

After a successful POST to create the search task, we immediately begin a background process to start populating search results for you to GET. You do not need to wait for the full set of results to be available; you can start retrieving partial search results as soon as they are available. Once the full text of the document has been searched and no more results will be added, the search task state will change from "processing" to "complete".

Request

Request Headers

Name Description
Content-Type Must be application/json

Request Body

Successful Response

Response Body

JSON with metadata about the created search task.

Error Responses

Status Code JSON errorCode Description
404 - No viewing session with the provided {viewingSessionId} could be found.
480 "DocumentNotProvidedYet" The viewing session does not yet have a source document attached.
480 "MissingInput" A required input value was not provided. See errorDetails in the response body.
480 "InvalidInput" An invalid input value was used. See errorDetails in the response body.
480 "MissingInputForSimpleTerm" An invalid input value was used in a "simple" term object. See errorDetails in the response body.
480 "InvalidInputForSimpleTerm" An invalid input value was used in a "simple" term object. See errorDetails in the response body.
480 "MissingInputForProximityTerm" An invalid input value was used in a "proximity" term object. See errorDetails in the response body.
480 "InvalidInputForProximityTerm" An invalid input value was used in a "proximity" term object. See errorDetails in the response body.
480 "FeatureDisabled" The viewing session was created with "serverSideSearch" disabled.
580 "InternalError" The server encountered an internal error when handling the request.

Example

Request

POST prizmdoc_server_base_url/v2/viewingSessions/DLbVh9sTmXJAmd1GeXbS9Gn3WHxs8oib2xPsW2xEFjnIDdoJcudPtxciodSYFQq6zYGabQ_rJIecdbkImTTkSA/searchTasks
Content-Type: application/json

{
  "input": {
    "searchTerms": [{
      "type": "simple",
      "pattern": "quick"
    }]
  }
}

Response

HTTP/1.1 200 OK
Content-Type: application/json

{
  "input": {
    "searchTerms": [{
      "type": "simple",
      "pattern": "quick",
      "caseSensitive": false,
      "contextPadding": 25
    }]
  },
  "processId": "pR5X6nPDgMwat6cxlmn0Q3",
  "state": "processing",
  "percentComplete": 0,
  "expirationDateTime": "2016-12-17T20:38:39.796Z"
}

Additional Examples

For more examples of how to construct different searches, see Example Searches.

GET /v2/searchTasks/{processId}

Gets information about a search task.

To get search results, use [GET /v2/searchTasks/{processId}/results].

Request

URL Parameters

Parameter Description
{processId} The processId which identifies the search task.

Request Headers

Name Description
Accusoft-Affinity-Token The affinityToken of the search task. Required when server clustering is enabled.

Successful Response

Response Body

JSON with metadata about the search task.

Error Responses

Status Code JSON errorCode Description
404 - No search task with the provided {processId} could be found.
400 "MissingInput" Can occur when clustering is enabled and an Accusoft-Affinity-Token request header was not provided.
580 "InternalError" The server encountered an internal error when handling the request.

Example

Request

GET prizmdoc_server_base_url/v2/searchTasks/pR5X6nPDgMwat6cxlmn0Q3
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=

Response

HTTP/1.1 200 OK
Content-Type: application/json

{
  "input": {
    "contextId": "ElkNzWtrUJp4rXI5YnLUgw",
    "searchTerms": [{
      "type": "simple",
      "pattern": "quick",
      "caseSensitive": false,
      "contextPadding": 25
    }]
  },
  "processId": "pR5X6nPDgMwat6cxlmn0Q3",
  "state": "complete",
  "percentComplete": 100,
  "expirationDateTime": "2016-12-17T20:38:39.796Z"
}

GET /v2/searchTasks/{processId}/results?limit={limit}&continueToken={continueToken}

Gets a block of newly-available search results up to a limit.

This URL is designed to give you the results in chunks as they become available. Each GET request will return the currently-known results up to a limit (default is 100). If a response contains a continueToken, it indicates that additional results may be available and that you should issue another GET request using that continueToken as a query string parameter to skip the results you have already received. As long as a response contains a continueToken, use it to issue a subsequent GET for more results. When you encounter a response which does not have a continueToken, you have received all of the results and no more GET requests are necessary.

In order to optimize the number of network requests you make, any response which contains a continueToken will also contain a continueAfter value with a recommended number of milliseconds you should wait before sending the next GET request.

Request

URL Parameters

Parameter Description
{processId} The processId which identifies the search task.
{limit} The maximum number of results to return for this HTTP request. Must be an integer greater than 0. Default is 100.
{continueToken} Used to continue getting results from the point where a previous GET request left off.

Request Headers

Name Description
Accusoft-Affinity-Token The affinityToken of the search task. Required when server clustering is enabled.

Successful Response

Response Body

JSON with any available search results.

Error Responses

Status Code JSON errorCode Description
404 - No search task with the provided {processId} could be found.
400 "MissingInput" Can occur when clustering is enabled and an Accusoft-Affinity-Token request header was not provided.
480 "InvalidInput" An invalid input value was used. See errorDetails in the response body.
480 "ResourceNotUsable" Can occur when the search task is in a state of "error". You may be able to get more information from a [GET /v2/searchTasks/{processId}].
580 "InternalError" The server encountered an internal error when handling the request.

Example

Say you have a search task which was created to find the regex "manag[a-z]*" in a particular whitepaper. Here is an example sequence of requests and responses illustrating how you would acquire the full set of results for the search task (for brevity, the total number of search results in this example is small).

You would start with an initial GET:

GET prizmdoc_server_base_url/v2/searchTasks/pR5X6nPDgMwat6cxlmn0Q3/results
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=
HTTP/1.1 200 OK
Content-Type: application/json

{
  "results": [
    {
      "id": 0,
      "pageIndex": 0,
      "text": "Management",
      "context": "Enterprise Content Management Best Practices",
      "boundingRectangle": { "x": 24.20, "y": 13.74, "width": 234.20, "height": 26.10 },
      "lineRectangles": [{ "x": 24.20, "y": 13.74, "width": 234.20, "height": 26.10 }],
      "pageData": { "width": 612, "height": 792 },
      "searchTerm": {
        "type": "simple",
        "pattern": "manag[a-z]*",
        "caseSensitive": false,
        "contextPadding": 25
      },
      "startIndex": 19,
      "startIndexInContext": 19
    },
    {
      "id": 1,
      "pageIndex": 0,
      "text": "management",
      "context": "ue of enterprise content management software should go way b",
      "boundingRectangle": { "x": 156.07, "y": 352.19, "width": 105.00, "height": 13.41 },
      "lineRectangles": [{ "x": 156.07, "y": 352.19, "width": 105.00, "height": 13.41 }],
      "pageData": { "width": 612, "height": 792 },
      "searchTerm": {
        "type": "simple",
        "pattern": "manag[a-z]*",
        "caseSensitive": false,
        "contextPadding": 25
      },
      "startIndex": 527,
      "startIndexInContext": 25
    }
  ],
  "pagesWithoutText": [],
  "continueToken": "Cx07GHlkmi32gxAQhv49WZ",
  "continueAfter": 500
}

The initial response has given us two results for the first page of the document (page index 0) and a continueToken which we should use to get more results after waiting 500 milliseconds.

So, half a second later, we issue a follow-up request with the continueToken passed in as a query string parameter (so we skip over the results we already have):

GET prizmdoc_server_base_url/v2/searchTasks/pR5X6nPDgMwat6cxlmn0Q3/results?continueToken=Cx07GHlkmi32gxAQhv49WZ
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=
HTTP/1.1 200 OK
Content-Type: application/json

{
  "results": [
    {
      "id": 2,
      "pageIndex": 1,
      "text": "management",
      "context": "Enterprise content management software helps eliminate",
      "boundingRectangle": { "x": 310.21, "y": 562.14, "width": 254.03, "height": 26.10 },
      "lineRectangles": [{ "x": 310.21, "y": 562.14, "width": 254.03, "height": 26.10 }],
      "pageData": { "width": 612, "height": 792 },
      "searchTerm": {
        "type": "simple",
        "pattern": "manag[a-z]*",
        "caseSensitive": false,
        "contextPadding": 25
      },
      "startIndex": 652,
      "startIndexInContext": 19
    }
  ],
  "pagesWithoutText": [2,3],
  "continueToken": "B4uGe7m0ZtxR3lkqA07Nmj",
  "continueAfter": 500
}

This time we get back a new result as well as some new information about pagesWithoutText: we now know that at least page indices 2 and 3 (zero-indexed page numbers) have no text at all.

The presence of a new continueToken tells us there may be more results, so we submit another request with the new continueToken:

GET prizmdoc_server_base_url/v2/searchTasks/pR5X6nPDgMwat6cxlmn0Q3/results?continueToken=B4uGe7m0ZtxR3lkqA07Nmj
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=
HTTP/1.1 200 OK
Content-Type: application/json

{
  "results": [
    {
      "id": 3,
      "pageIndex": 5,
      "text": "management",
      "context": "upply chains to contract management, or HR processes to gove",
      "boundingRectangle": { "x": 67.00, "y": 142.53, "width": 254.03, "height": 26.10 },
      "lineRectangles": [{ "x": 67.00, "y": 142.53, "width": 254.03, "height": 26.10 }],
      "pageData": { "width": 612, "height": 792 },
      "searchTerm": {
        "type": "simple",
        "pattern": "manag[a-z]*",
        "caseSensitive": false,
        "contextPadding": 25
      },
      "startIndex": 113,
      "startIndexInContext": 25
    }
  ],
  "pagesWithoutText": [2,3,4]
}

This time we get a new result for page index 5, and we now know that page indices 2, 3, and 4 all contain no text at all (apparently this was not much of a whitepaper!). The lack of a continueToken tells us we have received all of the results, so there are no more GET requests to make.

DELETE /v2/searchTasks/processId

Cancels the search task. Further requests using this processId will return errors.

Request

URL Parameters

Parameter Description
{processId} The processId which identifies the search task.

Request Headers

Name Description
Accusoft-Affinity-Token The affinityToken of the search task. Required when server clustering is enabled.

Successful Response

HTTP/1.1 204 No Content

Error Responses

Status Code JSON errorCode Description
404 - No search task with the provided {processId} could be found.
400 "MissingInput" Can occur when clustering is enabled and an Accusoft-Affinity-Token was not provided.
580 "InternalError" The server encountered an internal error when handlig the request.

Example Searches

The following examples demonstrate how to use input.searchTerms for both the [POST /v2/searchTasks] and [POST /v2/viewingSessions/{viewingSessionId}/searchTasks] URLs.

Start a search for a single word

This partial input JSON begins a search task which finds all instances of the word "quick":

{
  "input": {
    "searchTerms": [{
      "type": "simple",
      "pattern": "quick"
    }]
  }
}

Start a case-sensitive search for an exact phrase

This partial input JSON begins a case-sensitive search for the exact phrase "The quick brown fox jumped over the lazy dog.". Notice that we had to escape the period character because it is a special regex character (\.), and because this is a JSON string value, the backslash itself must also be escaped ("\\."):

{
  "input": {
    "searchTerms": [{
      "type": "simple",
      "pattern": "The quick brown fox jumped over the lazy dog\\.",
      "caseSensitive": true
    }]
  }
}

Start a search for every instance of the word "quick" or "brown" or "fox"

This partial input JSON begins a search for the words "quick" or "brown" or "fox", locating all instances of each of these words:

{
  "input": {
    "searchTerms": [{
      "type": "simple",
      "pattern": "quick"
    }, {
      "type": "simple",
      "pattern": "fox"
    }, {
      "type": "simple",
      "pattern": "dog"
    }]
  }
}

Start a search for "quick" and "fox" and "dog" where there are no more than 5 words between any two consecutive occurrences of them

{
  "input": {
    "searchTerms": [{
      "type": "proximity",
      "subTerms": [{
        "pattern": "quick"
      }, {
        "pattern": "fox"
      }, {
        "pattern": "dog"
      }],
      "distance": 5
    }]
  }
}

Start a case-sensitive search for "John Doe" within 30 words of what looks like a social security number

{
  "input": {
    "searchTerms": [{
      "type": "proximity",
      "subTerms": [{
        "pattern": "John Doe",
        "caseSensitive": true
      }, {
        "pattern": "\\d{3}-\\d{2}-\\d{4}"
      }],
      "distance": 30
    }]
  }
}