Search Contexts

The search context and search task APIs are designed for a viewer to perform server-side searching and text retrieval of a document.

A search context contains a collection of records of full-page text data, one record per page.

Available URLs

URL	Description
`POST /v2/searchContexts`	Creates a new search context.
`GET /v2/searchContexts/{contextId}`	Gets information about a search context.
`GET /v2/searchContexts/{contextId}/records`	Gets full-page text data (records) for a specified set of pages.

POST /v2/searchContexts

Creates a search context which will eventually hold a set of full-page text records for a source document.

After a successful POST to create the search context, we immediately begin a background process to extract the text records using a work file you specified in the POST (via input.fileId). As we extract pages of text, new records will become available for you to GET. The search context state will change from "processing" to "complete" when there are no more records to extract.

Request

Request Headers

Name	Description
`Content-Type`	Must be `application/json`
`Accusoft-Affinity-Token`	The `affinityToken` of the work file specified by `input.fileId`. Required when server clustering is enabled and `input.source` is `"workFile"`.

Request Body

input
- documentIdentifier (String) Required. Your own unique identifier for the source document. It is crucial that you use a unique value for each unique document, otherwise, the returned text for a document will not be correct.
- source (String) Required. Must be "workFile".
- fileId (String) Required. The id of the work file to extract text records from.
- password (String) Password to open the source document.
minSecondsAvailable (Integer) The minimum number of seconds this search context will remain available. The actual lifetime may be longer.

Successful Response

Response Body

JSON with metadata about the created search context. You can check for changes to this metadata with additional GET requests.

input (Object) Input we accepted to create the search context.
contextId (String) Unique id for this search context.
affinityToken (String) Affinity token for this search context. Present when clustering is enabled.
state (String) State of acquiring text records for the given input.documentIdentifier.
- "processing" - The server is acquiring text records.
- "complete" - All text records have been acquired.
- "error" - There was a problem acquiring text records.
percentComplete (Integer) Percentage of text records which have been acquired (from 0 to 100).
expirationDateTime (String) Currently planned date and time when the search context resource will expire and no longer be available for use. This time may be extended if we have need to keep using the data (for example, if there are search tasks executing against this context). Format is RFC 3339 Internet Date/Time profile of ISO 8601, e.g. "2016-11-05T08:15:30.494Z".
errorCode (String) Descriptive error code. Present when state is "error".
errorDetails (Object) Additional error details, if any. May be present when errorCode is present.

Error Responses

Status Code	JSON errorCode	Description
`400`	`"MissingInput"`	Can occur when clustering is enabled and an `Accusoft-Affinity-Token` request header was not provided.
`480`	`"MissingInput"`	A required input value was not provided. See `errorDetails` in the response body.
`480`	`"InvalidInput"`	An invalid input value was used. See `errorDetails` in the response body.
`580`	`"InternalError"`	The server encountered an internal error when handling the request.

Examples

Request

POST /v2/searchContexts
Content-Type: application/json
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=

{
  "input": {
    "documentIdentifier": "your-own-unique-identifier-for-the-source-document",
    "source": "workFile",
    "fileId": "ek5Zb123oYHSUEVx1bUrVQ"
  }
}

Response

HTTP/1.1 200 OK
Content-Type: application/json

{
  "input": {
    "documentIdentifier": "your-own-unique-identifier-for-the-source-document",
    "source": "workFile",
    "fileId": "ek5Zb123oYHSUEVx1bUrVQ"
  },
  "contextId": "ElkNzWtrUJp4rXI5YnLUgw",
  "state": "processing",
  "percentComplete": 0,
  "expirationDateTime": "2016-12-17T20:38:39.796Z"
}

GET /v2/searchContexts/{contextId}

Gets information about a search context.

Request

URL Parameters

Parameter	Description
`{contextId}`	The `contextId` which identifies the resource.

Request Headers

Name	Description
`Accusoft-Affinity-Token`	The `affinityToken` of the search context. Required when server clustering is enabled.

Successful Response

Response Body

JSON with current metadata about the search context.

input (Object) Input we accepted to create the search context.
contextId (String) Unique id for this search context.
affinityToken (String) Affinity token for this search context. Present when clustering is enabled.
state (String) State of acquiring text records for the given input.documentIdentifier.
- "processing" - The server is acquiring text records.
- "complete" - All text records have been acquired.
- "error" - There was a problem acquiring text records.
percentComplete (Integer) Percentage of text records which have been acquired (from 0 to 100).
expirationDateTime (String) Currently planned date and time when the search context resource will expire and no longer be available for use. This time may be extended if we have need to keep using the data (for example, if there are search tasks executing against this context). Format is RFC 3339 Internet Date/Time profile of ISO 8601, e.g. "2016-11-05T08:15:30-05:00".
errorCode (String) Descriptive error code. Present when state is "error".
errorDetails (Object) Additional error details, if any. May be present when errorCode is present.

Error Responses

Status Code	JSON errorCode	Description
`400`	`"MissingInput"`	Can occur when clustering is enabled and an `Accusoft-Affinity-Token` request header was not provided.
`580`	`"InternalError"`	The server encountered an internal error when handling the request.

Examples

Example request

GET /v2/searchContexts/ElkNzWtrUJp4rXI5YnLUgw
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=

Response when the state is still "processing"

HTTP/1.1 200 OK
Content-Type: application/json

{
  "input": {
    "documentIdentifier": "your-own-unique-identifier-for-the-source-document",
    "source": "workFile",
    "fileId": "ek5Zb123oYHSUEVx1bUrVQ"
  },
  "contextId": "ElkNzWtrUJp4rXI5YnLUgw",
  "affinityToken": "ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=",
  "state": "processing",
  "percentComplete": 47,
  "expirationDateTime": "2016-12-17T20:38:39.796Z"
}

Response when the state is "complete"

HTTP/1.1 200 OK
Content-Type: application/json

{
  "input": {
    "documentIdentifier": "your-own-unique-identifier-for-the-source-document",
    "source": "workFile",
    "fileId": "ek5Zb123oYHSUEVx1bUrVQ"
  },
  "contextId": "ElkNzWtrUJp4rXI5YnLUgw",
  "affinityToken": "ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=",
  "state": "complete",
  "percentComplete": 100,
  "expirationDateTime": "2016-12-17T20:38:39.796Z"
}

Response when the state is "error" because the work file could not be found

HTTP/1.1 200 OK
Content-Type: application/json

{
  "input": {
    "documentIdentifier": "your-own-unique-identifier-for-the-source-document",
    "source": "workFile",
    "fileId": "ek5Zb123oYHSUEVx1bUrVQ"
  },
  "contextId": "ElkNzWtrUJp4rXI5YnLUgw",
  "affinityToken": "ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=",
  "state": "error",
  "errorCode": "ResourceNotFound",
  "errorDetails": {
    "in": "searchContext",
    "at": "input.fileId"
  },
  "expirationDateTime": "2016-12-17T20:38:39.796Z"
}

Response when the source document required a password but no password was provided

HTTP/1.1 200 OK
Content-Type: application/json

{
  "input": {
    "documentIdentifier": "your-own-unique-identifier-for-the-source-document",
    "source": "workFile",
    "fileId": "ek5Zb123oYHSUEVx1bUrVQ"
  },
  "contextId": "ElkNzWtrUJp4rXI5YnLUgw",
  "affinityToken": "ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=",
  "state": "error",
  "errorCode": "InvalidPassword",
  "errorDetails": {
    "in": "searchContext",
    "at": "input.password"
  },
  "expirationDateTime": "2016-12-17T20:38:39.796Z"
}

Response when the source document required a password but the wrong password was provided

HTTP/1.1 200 OK
Content-Type: application/json

{
  "input": {
    "documentIdentifier": "your-own-unique-identifier-for-the-source-document",
    "source": "workFile",
    "fileId": "ek5Zb123oYHSUEVx1bUrVQ",
    "password": "wrong-password"
  },
  "contextId": "ElkNzWtrUJp4rXI5YnLUgw",
  "affinityToken": "ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=",
  "state": "error",
  "errorCode": "InvalidPassword",
  "errorDetails": {
    "in": "searchContext",
    "at": "input.password"
  },
  "expirationDateTime": "2016-12-17T20:38:39.796Z"
}

GET /v2/searchContexts/{contextId}/records?pages={pages}

Gets full-page text data (records) for a specified set of pages.

Request

URL Parameters

Parameter	Description
`{contextId}`	The `contextId` which identifies the resource.
`{pages}`	Required. A set of comma-delimited page indices (zero-indexed page numbers) and/or hyphenated page index ranges for which you want the full-page text data (records). See more below.

pages

The pages parameter accepts one or more zero-indexed page numbers (page indices). Between commas, you can specify individual pages (like 0), closed page ranges (like 0-3), and open-ended page ranges (like 3-, which means page index 3 through the end of the document).

Here are some examples:

Example	Description
`pages=0`	Get the text data for page index 0.
`pages=5`	Get the text data for page index 5.
`pages=0-5`	Get the text data for page indices 0-5.
`pages=3-`	Get the text data for page indices 3 through the end of the document.
`pages=0-`	Get the text data for all pages (page index 0 through the end of the document).
`pages=1-`	Get the text data for all but the first page (page index 1 through the end of the document).
`pages=0,2,5,9`	Get the text data for page indices 0, 2, 5, and 9.
`pages=2,4-5,7-`	Get the text data for page indices 2, 4 through 5, and 7 through the end of the document.

Request Headers

Name	Description
`Accusoft-Affinity-Token`	The `affinityToken` of the search context. Required when server clustering is enabled.

Successful Response

JSON containing full-page text records for the requested pages.

pages[] (Array of Objects) Always present. Array of full-page text record objects for the requested pages. Note that the order of the records is not guaranteed; you must use the number property of each returned item to know its page index. Items may contain:
- number (Integer) Always present. Page index (zero-indexed page number). The property is named simply number for backwards compatibility reasons.
- text (String) Page text.
- errorCode (String) A descriptive page-level error code (such as "CouldNotGetPageData") if there was a problem getting data for the page.
- width (Number) Page width.
- height (Number) Page height.
- rectangles[] (Array of Arrays) Bounding boxes for individual glyphs on the page. Each item will contain four numbers:
  - [0] (Number) Distance from the left edge of the page to the left edge of the glyph bounding box.
  - [1] (Number) Distance from the top edge of the page to the top edge of the glyph bounding box.
  - [2] (Number) Width of the glyph bounding box.
  - [3] (Number) Height of the glyph bounding box.
- markup[] (Array of Objects) Objects describing hyperlinks, if any. Each item may contain:
  - changeType (String) Value will always be "Add".
  - markType (String) Value will always be "DocumentHyperlink".
  - properties (Object) Properties of the hyperlink.
    - href (String) Destination URL.
    - rectangle (Object) Dimensions of the hyperlink bounding box on the page.
      - x (Number) Distance from the left edge of the page to the left edge of the hyperlink bounding box.
      - y (Number) Distance from the top edge of the page to the top edge of the hyperlink bounding box.
      - width (Number) Width of the hyperlink bounding box.
      - height (Number) Height of the hyperlink bounding box.
    - borderThickness (Number) Border thickness which should be applied.
    - borderHorizontalRadius (Number) Horizontal border radius which should be applied.
    - borderVerticalRadius (Number) Vertical border radius which should be applied.
    - borderOpacity (Integer) Border opacity which should be applied. Value will be from 0 to 255, where 0 represents fully transparent and 255 represents fully opaque.
errorCode (String) Descriptive error code. Present if there was a general problem getting all of the requested data.
errorDetails (Object) Present if there are additional error details.

Error Responses

Status Code	JSON errorCode	Description
`404`		No search context exists for the `{contextId}` given in the URL. It may have expired, or it may have never existed.
`400`	`"MissingInput"`	Can occur when clustering is enabled and an `Accusoft-Affinity-Token` request header was not provided.
`480`	`"MissingInput"`	A required input was missing. See the `errorDetails` for more information.
`480`	`"InvalidSyntax"`	Can occur when the `pages` query string parameter is set to a value we cannot understand.
`480`	`"ResourceNotUsable"`	Can occur when the search context is in a `state` of `"error"`. You may be able to get more information from a `GET /v2/searchContexts/{contextId}`.
`580`	`"InternalError"`	The server encountered an internal error when handling the request.

Examples

When all data is returned successfully

Request records for pages 0 through 9:

GET /v2/searchContexts/ElkNzWtrUJp4rXI5YnLUgw/records?pages=0-9
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=

Successful response (where ... indicates that data has been omitted for brevity):

HTTP/1.1 200 OK
Content-Type: application/json

{
  "pages": [
    {
      "number": 0,
      "text": "the page text",
      "width": 648.00,
      "height": 828.00,
      "rectangles": [
        [
          202.25,
          135.05,
          27.00,
          73.26
        ],
        [
          229.25,
          135.05,
          30.00,
          73.26
        ],
        ...
      ]
      "markup": [
        {
          "changeType": "Add",
          "markType": "DocumentHyperlink",
          "properties": {
            "rectangle": {
              "height": 14.71,
              "width": 86.20,
              "y": 73.50,
              "x": 71.31
            },
            "borderHorizontalRadius": 0.0,
            "borderVerticalRadius": 0.0,
            "borderThickness": 0.0,
            "href": "http://www.google.com/",
            "borderOpacity": 255
          }
        },
        ...
      ]
    },
    ...
  ]
}

When the data stream is interrupted

Because this URL may return large amounts of data, we progressively stream data to the HTTP response. As such, it is possible that we encounter a data streaming error after we have sent HTTP 200. When this happens, we will close the JSON with a top-level errorCode of "DataStreamInterruption", like so:

HTTP/1.1 200 OK
Content-Type: application/json

{
  "pages": [...],
  "errorCode": "DataStreamInterruption"
}

When out-of-range, non-existent pages are requested

If you request a set of pages that include non-existent pages beyond the length of the document, we will include whatever actual pages we can, but we will also add a top-level errorCode of "RequestedPagesOutOfRange" with the actual documentPageCount within an errorDetails object, like so:

GET /v2/searchContexts/ElkNzWtrUJp4rXI5YnLUgw/records?pages=0-9
Content-Type: application/json
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=

HTTP/1.1 200 OK
Content-Type: application/json

{
  "pages": [...],
  "errorCode": "RequestedPagesOutOfRange",
  "errorDetails": {
    "documentPageCount": 3
  }
}

When data cannot be extracted from some pages

The pages array will contain one item for each requested page that actually exists. If we are unable to obtain data for a particular page, we will include an item in the pages array that contains the page number and a page-specific errorCode of "CouldNotGetPageData", like so:

HTTP/1.1 200 OK
Content-Type: application/json

{
  "pages": [
    {
      "number": 0,
      "text": "Once upon a time...",
      "width": 612.00,
      "height": 792.00,
      "rectangles": [...]
    },
    {
      "number": 1,
      "errorCode": "CouldNotGetPageData"
    },
    {
      "number": 2,
      "errorCode": "CouldNotGetPageData"
    },
    {
      "number": 3,
      "text": "and then, she said to the dragon...",
      "width": 612.00,
      "height": 792.00,
      "rectangles": [...]
    }
  ]
}