PrizmDoc v12.0 -
Search Tasks

Search Tasks

The search context and search task APIs are designed for a viewer to perform server-side searching and text retrieval of a document.

A search task represents an asynchronous full-text search of a document (via a search context) and yields results as they become available.

Available URLs

URL Description
POST /v2/searchTasks Starts an asynchronous full-text search against a search context.
POST /v2/viewingSessions/{viewingSessionId}/searchTasks Starts an asynchronous full-text search against a viewing session's source document.
GET /v2/searchTasks/{processId} Gets information about a search task.
GET /v2/searchTasks/{processId}/results Gets available search results.

POST /v2/searchTasks

Starts an asynchronous full-text search against a search context.

After a successful POST to create the search task, we immediately begin a background process to start populating search results for you to GET. You do not need to wait for the full set of results to be available; you can start retrieving partial search results as soon as they are available. Once the full text of the document has been searched and no more results will be added, the search task state will change from "processing" to "complete".

Request

Request Headers

Name Description
Content-Type Must be application/json
Accusoft-Affinity-Token The affinityToken of the search context specified by input.contextId. Required when server clustering is enabled.

Request Body

Successful Response

Response Body

JSON with metadata about the created search task.

Error Responses

Status Code JSON errorCode Description
400 "MissingInput" Can occur when clustering is enabled and an Accusoft-Affinity-Token request header was not provided.
480 "MissingInput" A required input value was not provided. See errorDetails in the response body.
480 "InvalidInput" An invalid input value was used. See errorDetails in the response body.
480 "MissingInputForSimpleTerm" An invalid input value was used in a "simple" term object. See errorDetails in the response body.
480 "InvalidInputForSimpleTerm" An invalid input value was used in a "simple" term object. See errorDetails in the response body.
480 "MissingInputForProximityTerm" An invalid input value was used in a "proximity" term object. See errorDetails in the response body.
480 "InvalidInputForProximityTerm" An invalid input value was used in a "proximity" term object. See errorDetails in the response body.
480 "ResourceNotFound" Can occur when the search context specified by contextId could not be found. See errorDetails in the response body.
480 "ResourceNotUsable" Can occur when the search context specified by contextId is not usable. See errorDetails in the response body.
580 "InternalError" The server encountered an internal error when handling the request.

Example

Request

POST /v2/searchTasks
Content-Type: application/json
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=

{
  "input": {
    "contextId": "ElkNzWtrUJp4rXI5YnLUgw",
    "searchTerms": [{
      "type": "simple",
      "pattern": "quick"
    }]
  }
}

Response

HTTP/1.1 200 OK
Content-Type: application/json

{
  "input": {
    "contextId": "ElkNzWtrUJp4rXI5YnLUgw",
    "searchTerms": [{
      "type": "simple",
      "pattern": "quick",
      "caseSensitive": false,
      "contextPadding": 25
    }]
  },
  "processId": "pR5X6nPDgMwat6cxlmn0Q3",
  "state": "processing",
  "percentComplete": 0,
  "expirationDateTime": "2016-12-17T20:38:39.796Z"
}

Additional Examples

For more examples of how to construct different searches, see Example Searches.

POST /v2/viewingSessions/{viewingSessionId}/searchTasks

Starts an asynchronous full-text search against a viewing session's source document.

After a successful POST to create the search task, we immediately begin a background process to start populating search results for you to GET. You do not need to wait for the full set of results to be available; you can start retrieving partial search results as soon as they are available. Once the full text of the document has been searched and no more results will be added, the search task state will change from "processing" to "complete".

Request

Request Headers

Name Description
Content-Type Must be application/json

Request Body

Successful Response

Response Body

JSON with metadata about the created search task.

Error Responses

Status Code JSON errorCode Description
404 No viewing session with the provided {viewingSessionId} could be found.
480 "DocumentNotProvidedYet" The viewing session does not yet have a source document attached.
480 "MissingInput" A required input value was not provided. See errorDetails in the response body.
480 "InvalidInput" An invalid input value was used. See errorDetails in the response body.
480 "MissingInputForSimpleTerm" An invalid input value was used in a "simple" term object. See errorDetails in the response body.
480 "InvalidInputForSimpleTerm" An invalid input value was used in a "simple" term object. See errorDetails in the response body.
480 "MissingInputForProximityTerm" An invalid input value was used in a "proximity" term object. See errorDetails in the response body.
480 "InvalidInputForProximityTerm" An invalid input value was used in a "proximity" term object. See errorDetails in the response body.
480 "FeatureDisabled" The viewing session was created with "serverSideSearch" disabled.
580 "InternalError" The server encountered an internal error when handling the request.

Example

Request

POST /v2/viewingSessions/DLbVh9sTmXJAmd1GeXbS9Gn3WHxs8oib2xPsW2xEFjnIDdoJcudPtxciodSYFQq6zYGabQ_rJIecdbkImTTkSA/searchTasks
Content-Type: application/json

{
  "input": {
    "searchTerms": [{
      "type": "simple",
      "pattern": "quick"
    }]
  }
}

Response

HTTP/1.1 200 OK
Content-Type: application/json

{
  "input": {
    "searchTerms": [{
      "type": "simple",
      "pattern": "quick",
      "caseSensitive": false,
      "contextPadding": 25
    }]
  },
  "processId": "pR5X6nPDgMwat6cxlmn0Q3",
  "state": "processing",
  "percentComplete": 0,
  "expirationDateTime": "2016-12-17T20:38:39.796Z"
}

Additional Examples

For more examples of how to construct different searches, see Example Searches.

GET /v2/searchTasks/{processId}

Gets information about a search task.

To get search results, use GET /v2/searchTasks/{processId}/results.

Request

URL Parameters

Parameter Description
{processId} The processId which identifies the search task.

Request Headers

Name Description
Accusoft-Affinity-Token The affinityToken of the search task. Required when server clustering is enabled.

Successful Response

Response Body

JSON with metadata about the search task.

Error Responses

Status Code JSON errorCode Description
404 No search task with the provided {processId} could be found.
400 "MissingInput" Can occur when clustering is enabled and an Accusoft-Affinity-Token request header was not provided.
580 "InternalError" The server encountered an internal error when handling the request.

Example

Request

GET /v2/searchTasks/pR5X6nPDgMwat6cxlmn0Q3
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=

Response

HTTP/1.1 200 OK
Content-Type: application/json

{
  "input": {
    "contextId": "ElkNzWtrUJp4rXI5YnLUgw",
    "searchTerms": [{
      "type": "simple",
      "pattern": "quick",
      "caseSensitive": false,
      "contextPadding": 25
    }]
  },
  "processId": "pR5X6nPDgMwat6cxlmn0Q3",
  "state": "complete",
  "percentComplete": 100,
  "expirationDateTime": "2016-12-17T20:38:39.796Z"
}

GET /v2/searchTasks/{processId}/results?limit={limit}&continueToken={continueToken}

Gets a block of newly-available search results up to a limit.

This URL is designed to give you the results in chunks as they become available. Each GET request will return the currently-known results up to a limit (default is 100). If a response contains a continueToken, it indicates that additional results may be available and that you should issue another GET request using that continueToken as a query string parameter to skip the results you have already received. As long as a response contains a continueToken, use it to issue a subsequent GET for more results. When you encounter a response which does not have a continueToken, you have received all of the results and no more GET requests are necessary.

In order to optimize the number of network requests you make, any response which contains a continueToken will also contain a continueAfter value with a recommended number of milliseconds you should wait before sending the next GET request.

Request

URL Parameters

Parameter Description
{processId} The processId which identifies the search task.
{limit} The maximum number of results to return for this HTTP request. Must be an integer greater than 0. Default is 100.
{continueToken} Used to continue getting results from the point where a previous GET request left off.

Request Headers

Name Description
Accusoft-Affinity-Token The affinityToken of the search task. Required when server clustering is enabled.

Successful Response

Response Body

JSON with any available search results.

Error Responses

Status Code JSON errorCode Description
404 No search task with the provided {processId} could be found.
400 "MissingInput" Can occur when clustering is enabled and an Accusoft-Affinity-Token request header was not provided.
480 "InvalidInput" An invalid input value was used. See errorDetails in the response body.
480 "ResourceNotUsable" Can occur when the search task is in a state of "error". You may be able to get more information from a GET /v2/searchTasks/{processId}.
580 "InternalError" The server encountered an internal error when handling the request.

Example

Say you have a search task which was created to find the regex "manag[a-z]*" in a particular whitepaper. Here is an example sequence of requests and responses illustrating how you would acquire the full set of results for the search task (for brevity, the total number of search results in this example is small).

You would start with an initial GET:

GET /v2/searchTasks/pR5X6nPDgMwat6cxlmn0Q3/results
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=

HTTP/1.1 200 OK
Content-Type: application/json

{
  "results": [
    {
      "id": 0,
      "pageIndex": 0,
      "text": "Management",
      "context": "Enterprise Content Management Best Practices",
      "boundingRectangle": { "x": 24.20, "y": 13.74, "width": 234.20, "height": 26.10 },
      "searchTerm": {
        "type": "simple",
        "pattern": "manag[a-z]*",
        "caseSensitive": false,
        "contextPadding": 25
      },
      "startIndex": 19,
      "startIndexInContext": 19
    },
    {
      "id": 1,
      "pageIndex": 0,
      "text": "management",
      "context": "ue of enterprise content management software should go way b",
      "boundingRectangle": { "x": 156.07, "y": 352.19, "width": 105.00, "height": 13.41 },
      "searchTerm": {
        "type": "simple",
        "pattern": "manag[a-z]*",
        "caseSensitive": false,
        "contextPadding": 25
      },
      "startIndex": 527,
      "startIndexInContext": 25
    }
  ],
  "pagesWithoutText": [],
  "continueToken": "Cx07GHlkmi32gxAQhv49WZ",
  "continueAfter": 500
}

The initial response has given us two results for the first page of the document (page index 0) and a continueToken which we should use to get more results after waiting 500 milliseconds.

So, half a second later, we issue a follow-up request with the continueToken passed in as a query string parameter (so we skip over the results we already have):

GET /v2/searchTasks/pR5X6nPDgMwat6cxlmn0Q3/results?continueToken=Cx07GHlkmi32gxAQhv49WZ
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=

HTTP/1.1 200 OK
Content-Type: application/json

{
  "results": [
    {
      "id": 2,
      "pageIndex": 1,
      "text": "management",
      "context": "Enterprise content management software helps eliminate",
      "boundingRectangle": { "x": 310.21, "y": 562.14, "width": 254.03, "height": 26.10 },
      "searchTerm": {
        "type": "simple",
        "pattern": "manag[a-z]*",
        "caseSensitive": false,
        "contextPadding": 25
      },
      "startIndex": 652,
      "startIndexInContext": 19
    }
  ],
  "pagesWithoutText": [2,3],
  "continueToken": "B4uGe7m0ZtxR3lkqA07Nmj",
  "continueAfter": 500
}

This time we get back a new result as well as some new information about pagesWithoutText: we now know that at least page indices 2 and 3 (zero-indexed page numbers) have no text at all.

The presence of a new continueToken tells us there may be more results, so we submit another request with the new continueToken:

GET /v2/searchTasks/pR5X6nPDgMwat6cxlmn0Q3/results?continueToken=B4uGe7m0ZtxR3lkqA07Nmj
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=

HTTP/1.1 200 OK
Content-Type: application/json

{
  "results": [
    {
      "id": 3,
      "pageIndex": 5,
      "text": "management",
      "context": "upply chains to contract management, or HR processes to gove",
      "boundingRectangle": { "x": 67.00, "y": 142.53, "width": 254.03, "height": 26.10 },
      "searchTerm": {
        "type": "simple",
        "pattern": "manag[a-z]*",
        "caseSensitive": false,
        "contextPadding": 25
      },
      "startIndex": 113,
      "startIndexInContext": 25
    }
  ],
  "pagesWithoutText": [2,3,4]
}

This time we get a new result for page index 5, and we now know that page indices 2, 3, and 4 all contain no text at all (apparently this was not much of a whitepaper!). The lack of a continueToken tells us we have received all of the results, so there are no more GET requests to make.

Example Searches

The following examples demonstrate how to use input.searchTerms for both the POST /v2/searchTasks and POST /v2/viewingSessions/{viewingSessionId}/searchTasks URLs.

Start a search for a single word

This partial input JSON begins a search task which finds all instances of the word "quick":

{
  "input": {
    "searchTerms": [{
      "type": "simple",
      "pattern": "quick"
    }]
  }
}

Start a case-sensitive search for an exact phrase

This partial input JSON begins a case-sensitive search for the exact phrase "The quick brown fox jumped over the lazy dog.". Notice that we had to escape the period character because it is a special regex character (\.), and because this is a JSON string value, the backslash itself must also be escaped ("\\."):

{
  "input": {
    "searchTerms": [{
      "type": "simple",
      "pattern": "The quick brown fox jumped over the lazy dog\\.",
      "caseSensitive": true
    }]
  }
}

Start a search for every instance of the word "quick" or "brown" or "fox"

This partial input JSON begins a search for the words "quick" or "brown" or "fox", locating all instances of each of these words:

{
  "input": {
    "searchTerms": [{
      "type": "simple",
      "pattern": "quick"
    }, {
      "type": "simple",
      "pattern": "fox"
    }, {
      "type": "simple",
      "pattern": "dog"
    }]
  }
}

Start a search for "quick" and "fox" and "dog" where there are no more than 5 words between any two consecutive occurrences of them

{
  "input": {
    "searchTerms": [{
      "type": "proximity",
      "subTerms": [{
        "pattern": "quick"
      }, {
        "pattern": "fox"
      }, {
        "pattern": "dog"
      }],
      "distance": 5
    }]
  }
}

Start a case-sensitive search for "John Doe" within 30 words of what looks like a social security number

{
  "input": {
    "searchTerms": [{
      "type": "proximity",
      "subTerms": [{
        "pattern": "John Doe",
        "caseSensitive": true
      }, {
        "pattern": "\\d{3}-\\d{2}-\\d{4}"
      }],
      "distance": 30
    }]
  }
}

 

 


©2016. Accusoft Corporation. All Rights Reserved.

Send Feedback