PrizmDoc v12.1 - January 5, 2017
Search Tasks

Search Tasks

The search task API is designed for a viewer to perform server-side searching and text retrieval against the source document of a viewing session.

A search task represents an asynchronous full-text search of a document and yields results as they become available.

Available URLs

URL Description
POST /v2/viewingSessions/{viewingSessionId}/searchTasks Starts an asynchronous full-text search against a viewing session's source document.
GET /v2/searchTasks/{processId}/results Gets available search results.

POST /v2/viewingSessions/{viewingSessionId}/searchTasks

Starts an asynchronous full-text search against a viewing session's source document.

After a successful POST to create the search task, we immediately begin a background process to start populating search results for you to GET. You do not need to wait for the full set of results to be available; you can start retrieving partial search results as soon as they are available. Once the full text of the document has been searched and no more results will be added, the search task state will change from "processing" to "complete".

Request

Request Headers

Name Description
Content-Type Must be application/json

Request Body

Successful Response

Response Body

JSON with metadata about the created search task.

Error Responses

Status Code JSON errorCode Description
404 No viewing session with the provided {viewingSessionId} could be found.
480 "DocumentNotProvidedYet" The viewing session does not yet have a source document attached.
480 "MissingInput" A required input value was not provided. See errorDetails in the response body.
480 "InvalidInput" An invalid input value was used. See errorDetails in the response body.
480 "MissingInputForSimpleTerm" An invalid input value was used in a "simple" term object. See errorDetails in the response body.
480 "InvalidInputForSimpleTerm" An invalid input value was used in a "simple" term object. See errorDetails in the response body.
480 "MissingInputForProximityTerm" An invalid input value was used in a "proximity" term object. See errorDetails in the response body.
480 "InvalidInputForProximityTerm" An invalid input value was used in a "proximity" term object. See errorDetails in the response body.
480 "FeatureDisabled" The viewing session was created with "serverSideSearch" disabled.
501 "NotImplemented" Server-side searching is not yet implemented for a viewing session which uses a cached viewing package.
580 "InternalError" The server encountered an internal error when handling the request.

Example

Request

This POST begins a search task which finds all instances of the word "quick":

POST /v2/viewingSessions/DLbVh9sTmXJAmd1GeXbS9Gn3WHxs8oib2xPsW2xEFjnIDdoJcudPtxciodSYFQq6zYGabQ_rJIecdbkImTTkSA/searchTasks
Content-Type: application/json

{
  "input": {
    "searchTerms": [{
      "type": "simple",
      "pattern": "quick"
    }]
  }
}

Response

HTTP/1.1 200 OK
Content-Type: application/json

{
  "input": {
    "searchTerms": [{
      "type": "simple",
      "pattern": "quick",
      "caseSensitive": false,
      "contextPadding": 25
    }]
  },
  "processId": "pR5X6nPDgMwat6cxlmn0Q3",
  "state": "processing",
  "percentComplete": 0,
  "expirationDateTime": "2016-12-17T20:38:39.796Z"
}

Additional Examples

Start a case-sensitive search for an exact phrase

This POST begins a case-sensitive search for the exact phrase "The quick brown fox jumped over the lazy dog.". Notice that we had to escape the period character because it is a special regex character (\.), and because this is a JSON string value, the backslash itself must also be escaped ("\\."):

POST /v2/viewingSessions/DLbVh9sTmXJAmd1GeXbS9Gn3WHxs8oib2xPsW2xEFjnIDdoJcudPtxciodSYFQq6zYGabQ_rJIecdbkImTTkSA/searchTasks
Content-Type: application/json

{
  "input": {
    "searchTerms": [{
      "type": "simple",
      "pattern": "The quick brown fox jumped over the lazy dog\\.",
      "caseSensitive": true
    }]
  }
}

Start a search for every instance of the word "quick" or "brown" or "fox"

This POST begins a search for the words "quick" or "brown" or "fox", locating all instances of each of these words:

POST /v2/viewingSessions/DLbVh9sTmXJAmd1GeXbS9Gn3WHxs8oib2xPsW2xEFjnIDdoJcudPtxciodSYFQq6zYGabQ_rJIecdbkImTTkSA/searchTasks
Content-Type: application/json

{
  "input": {
    "searchTerms": [{
      "type": "simple",
      "pattern": "quick"
    }, {
      "type": "simple",
      "pattern": "fox"
    }, {
      "type": "simple",
      "pattern": "dog"
    }]
  }
}

Start a search for "quick" and "fox" and "dog" where there are no more than 5 words between any two consecutive occurrences of them

POST /v2/viewingSessions/DLbVh9sTmXJAmd1GeXbS9Gn3WHxs8oib2xPsW2xEFjnIDdoJcudPtxciodSYFQq6zYGabQ_rJIecdbkImTTkSA/searchTasks
Content-Type: application/json

{
  "input": {
    "searchTerms": [{
      "type": "proximity",
      "subTerms": [{
        "pattern": "quick"
      }, {
        "pattern": "fox"
      }, {
        "pattern": "dog"
      }],
      "distance": 5
    }]
  }
}

Start a case-sensitive search for "John Doe" within 30 words of what looks like a social security number

POST /v2/viewingSessions/DLbVh9sTmXJAmd1GeXbS9Gn3WHxs8oib2xPsW2xEFjnIDdoJcudPtxciodSYFQq6zYGabQ_rJIecdbkImTTkSA/searchTasks
Content-Type: application/json

{
  "input": {
    "searchTerms": [{
      "type": "proximity",
      "subTerms": [{
        "pattern": "John Doe",
        "caseSensitive": true
      }, {
        "pattern": "\\d{3}-\\d{2}-\\d{4}"
      }],
      "distance": 30
    }]
  }
}

GET /v2/searchTasks/{processId}/results?limit={limit}&continueToken={continueToken}

Gets a block of newly-available search results up to a limit.

This URL is designed to give you the results in chunks as they become available. Each GET request will return the currently-known results up to a limit (default is 100). If a response contains a continueToken, it indicates that additional results may be available and that you should issue another GET request using that continueToken as a query string parameter to skip the results you have already received. As long as a response contains a continueToken, use it to issue a subsequent GET for more results. When you encounter a response which does not have a continueToken, you have received all of the results and no more GET requests are necessary.

In order to optimize the number of network requests you make, any response which contains a continueToken will also contain a continueAfter value with a recommended number of milliseconds you should wait before sending the next GET request.

Request

URL Parameters

Parameter Description
{processId} The processId which identifies the search task.
{limit} The maximum number of results to return for this HTTP request. Must be an integer greater than 0. Default is 100.
{continueToken} Used to continue getting results from the point where a previous GET request left off.

Request Headers

Name Description
Accusoft-Affinity-Token The affinityToken of the search task. Required when server clustering is enabled.

Successful Response

Response Body

JSON with any available search results.

Error Responses

Status Code JSON errorCode Description
404 No search task with the provided {processId} could be found.
480 "MissingInput" A required input value was not provided. See errorDetails in the response body.
480 "InvalidInput" An invalid input value was used. See errorDetails in the response body.
480 "ResourceNotUsable" Can occur when the search task is in a state of "error".
580 "InternalError" The server encountered an internal error when handling the request.

Example

Say you have a search task which was created to find the regex "manag[a-z]*" in a particular whitepaper. Here is an example sequence of requests and responses illustrating how you would acquire the full set of results for the search task (for brevity, the total number of search results in this example is small).

You would start with an initial GET:

GET /v2/searchTasks/pR5X6nPDgMwat6cxlmn0Q3/results
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=

HTTP/1.1 200 OK
Content-Type: application/json

{
  "results": [
    {
      "id": 0,
      "pageIndex": 0,
      "text": "Management",
      "context": "Enterprise Content Management Best Practices",
      "boundingRectangle": { "x": 24.20, "y": 13.74, "width": 234.20, "height": 26.10 },
      "searchTerm": {
        "type": "simple",
        "pattern": "manag[a-z]*",
        "caseSensitive": false,
        "contextPadding": 25
      },
      "startIndex": 19,
      "startIndexInContext": 19
    },
    {
      "id": 1,
      "pageIndex": 0,
      "text": "management",
      "context": "ue of enterprise content management software should go way b",
      "boundingRectangle": { "x": 156.07, "y": 352.19, "width": 105.00, "height": 13.41 },
      "searchTerm": {
        "type": "simple",
        "pattern": "manag[a-z]*",
        "caseSensitive": false,
        "contextPadding": 25
      },
      "startIndex": 527,
      "startIndexInContext": 25
    }
  ],
  "pagesWithoutText": [],
  "continueToken": "Cx07GHlkmi32gxAQhv49WZ",
  "continueAfter": 500
}

The initial response has given us two results for the first page of the document (page index 0) and a continueToken which we should use to get more results after waiting 500 milliseconds.

So, half a second later, we issue a follow-up request with the continueToken passed in as a query string parameter (so we skip over the results we already have):

GET /v2/searchTasks/pR5X6nPDgMwat6cxlmn0Q3/results?continueToken=Cx07GHlkmi32gxAQhv49WZ
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=

HTTP/1.1 200 OK
Content-Type: application/json

{
  "results": [
    {
      "id": 2,
      "pageIndex": 1,
      "text": "management",
      "context": "Enterprise content management software helps eliminate",
      "boundingRectangle": { "x": 310.21, "y": 562.14, "width": 254.03, "height": 26.10 },
      "searchTerm": {
        "type": "simple",
        "pattern": "manag[a-z]*",
        "caseSensitive": false,
        "contextPadding": 25
      },
      "startIndex": 652,
      "startIndexInContext": 19
    }
  ],
  "pagesWithoutText": [2,3],
  "continueToken": "B4uGe7m0ZtxR3lkqA07Nmj",
  "continueAfter": 500
}

This time we get back a new result as well as some new information about pagesWithoutText: we now know that at least page indices 2 and 3 (zero-indexed page numbers) have no text at all.

The presence of a new continueToken tells us there may be more results, so we submit another request with the new continueToken:

GET /v2/searchTasks/pR5X6nPDgMwat6cxlmn0Q3/results?continueToken=B4uGe7m0ZtxR3lkqA07Nmj
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=

HTTP/1.1 200 OK
Content-Type: application/json

{
  "results": [
    {
      "id": 3,
      "pageIndex": 5,
      "text": "management",
      "context": "upply chains to contract management, or HR processes to gove",
      "boundingRectangle": { "x": 67.00, "y": 142.53, "width": 254.03, "height": 26.10 },
      "searchTerm": {
        "type": "simple",
        "pattern": "manag[a-z]*",
        "caseSensitive": false,
        "contextPadding": 25
      },
      "startIndex": 113,
      "startIndexInContext": 25
    }
  ],
  "pagesWithoutText": [2,3,4]
}

This time we get a new result for page index 5, and we now know that page indices 2, 3, and 4 all contain no text at all (apparently this was not much of a whitepaper!). The lack of a continueToken tells us we have received all of the results, so there are no more GET requests to make.

 

 


©2017. Accusoft Corporation. All Rights Reserved.

Send Feedback