Search Contexts
The search context and search task APIs are designed for a viewer to perform server-side searching and text retrieval of a document.
A search context contains a collection of records of full-page text data, one record per page.
Available URLs
URL | Description |
---|---|
POST /v2/searchContexts |
Creates a new search context. |
GET /v2/searchContexts/{contextId} |
Gets information about a search context. |
GET /v2/searchContexts/{contextId}/records |
Gets full-page text data (records) for a specified set of pages. |
POST /v2/searchContexts
Creates a search context which will eventually hold a set of full-page text records for a source document.
After a successful POST to create the search context, we immediately begin a background process to extract the text records using a work file you specified in the POST (via input.fileId
). As we extract pages of text, new records will become available for you to GET. The search context state
will change from "processing"
to "complete"
when there are no more records to extract.
Request
Request Headers
Name | Description |
---|---|
Content-Type |
Must be application/json |
Accusoft-Affinity-Token |
The affinityToken of the work file specified by input.fileId . Required when server clustering is enabled and input.source is "workFile" . |
Request Body
-
input
documentIdentifier
(String) Required. Your own unique identifier for the source document. It is crucial that you use a unique value for each unique document, otherwise, the returned text for a document will not be correct.source
(String) Required. Must be"workFile"
.fileId
(String) Required. The id of the work file to extract text records from.password
(String) Password to open the source document.
minSecondsAvailable
(Integer) The minimum number of seconds this search context will remain available. The actual lifetime may be longer.
Successful Response
Response Body
JSON with metadata about the created search context. You can check for changes to this metadata with additional GET requests.
input
(Object) Input we accepted to create the search context.contextId
(String) Unique id for this search context.affinityToken
(String) Affinity token for this search context. Present when clustering is enabled.-
state
(String) State of acquiring text records for the giveninput.documentIdentifier
."processing"
- The server is acquiring text records."complete"
- All text records have been acquired."error"
- There was a problem acquiring text records.
percentComplete
(Integer) Percentage of text records which have been acquired (from0
to100
).expirationDateTime
(String) Currently planned date and time when the search context resource will expire and no longer be available for use. This time may be extended if we have need to keep using the data (for example, if there are search tasks executing against this context). Format is RFC 3339 Internet Date/Time profile of ISO 8601, e.g."2016-11-05T08:15:30.494Z"
.errorCode
(String) Descriptive error code. Present whenstate
is"error"
.errorDetails
(Object) Additional error details, if any. May be present whenerrorCode
is present.
Error Responses
Status Code | JSON errorCode | Description |
---|---|---|
400 |
"MissingInput" |
Can occur when clustering is enabled and an Accusoft-Affinity-Token request header was not provided. |
480 |
"MissingInput" |
A required input value was not provided. See errorDetails in the response body. |
480 |
"InvalidInput" |
An invalid input value was used. See errorDetails in the response body. |
580 |
"InternalError" |
The server encountered an internal error when handling the request. |
Examples
Request
POST /v2/searchContexts
Content-Type: application/json
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=
{
"input": {
"documentIdentifier": "your-own-unique-identifier-for-the-source-document",
"source": "workFile",
"fileId": "ek5Zb123oYHSUEVx1bUrVQ"
}
}
Response
HTTP/1.1 200 OK
Content-Type: application/json
{
"input": {
"documentIdentifier": "your-own-unique-identifier-for-the-source-document",
"source": "workFile",
"fileId": "ek5Zb123oYHSUEVx1bUrVQ"
},
"contextId": "ElkNzWtrUJp4rXI5YnLUgw",
"state": "processing",
"percentComplete": 0,
"expirationDateTime": "2016-12-17T20:38:39.796Z"
}
GET /v2/searchContexts/{contextId}
Gets information about a search context.
Request
URL Parameters
Parameter | Description |
---|---|
{contextId} |
The contextId which identifies the resource. |
Request Headers
Name | Description |
---|---|
Accusoft-Affinity-Token |
The affinityToken of the search context. Required when server clustering is enabled. |
Successful Response
Response Body
JSON with current metadata about the search context.
input
(Object) Input we accepted to create the search context.contextId
(String) Unique id for this search context.affinityToken
(String) Affinity token for this search context. Present when clustering is enabled.-
state
(String) State of acquiring text records for the giveninput.documentIdentifier
."processing"
- The server is acquiring text records."complete"
- All text records have been acquired."error"
- There was a problem acquiring text records.
percentComplete
(Integer) Percentage of text records which have been acquired (from0
to100
).expirationDateTime
(String) Currently planned date and time when the search context resource will expire and no longer be available for use. This time may be extended if we have need to keep using the data (for example, if there are search tasks executing against this context). Format is RFC 3339 Internet Date/Time profile of ISO 8601, e.g."2016-11-05T08:15:30-05:00"
.errorCode
(String) Descriptive error code. Present whenstate
is"error"
.errorDetails
(Object) Additional error details, if any. May be present whenerrorCode
is present.
Error Responses
Status Code | JSON errorCode | Description |
---|---|---|
400 |
"MissingInput" |
Can occur when clustering is enabled and an Accusoft-Affinity-Token request header was not provided. |
580 |
"InternalError" |
The server encountered an internal error when handling the request. |
Examples
Example request
GET /v2/searchContexts/ElkNzWtrUJp4rXI5YnLUgw
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=
Response when the state is still "processing"
HTTP/1.1 200 OK
Content-Type: application/json
{
"input": {
"documentIdentifier": "your-own-unique-identifier-for-the-source-document",
"source": "workFile",
"fileId": "ek5Zb123oYHSUEVx1bUrVQ"
},
"contextId": "ElkNzWtrUJp4rXI5YnLUgw",
"affinityToken": "ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=",
"state": "processing",
"percentComplete": 47,
"expirationDateTime": "2016-12-17T20:38:39.796Z"
}
Response when the state is "complete"
HTTP/1.1 200 OK
Content-Type: application/json
{
"input": {
"documentIdentifier": "your-own-unique-identifier-for-the-source-document",
"source": "workFile",
"fileId": "ek5Zb123oYHSUEVx1bUrVQ"
},
"contextId": "ElkNzWtrUJp4rXI5YnLUgw",
"affinityToken": "ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=",
"state": "complete",
"percentComplete": 100,
"expirationDateTime": "2016-12-17T20:38:39.796Z"
}
Response when the state is "error" because the work file could not be found
HTTP/1.1 200 OK
Content-Type: application/json
{
"input": {
"documentIdentifier": "your-own-unique-identifier-for-the-source-document",
"source": "workFile",
"fileId": "ek5Zb123oYHSUEVx1bUrVQ"
},
"contextId": "ElkNzWtrUJp4rXI5YnLUgw",
"affinityToken": "ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=",
"state": "error",
"errorCode": "ResourceNotFound",
"errorDetails": {
"in": "searchContext",
"at": "input.fileId"
},
"expirationDateTime": "2016-12-17T20:38:39.796Z"
}
Response when the source document required a password but no password was provided
HTTP/1.1 200 OK
Content-Type: application/json
{
"input": {
"documentIdentifier": "your-own-unique-identifier-for-the-source-document",
"source": "workFile",
"fileId": "ek5Zb123oYHSUEVx1bUrVQ"
},
"contextId": "ElkNzWtrUJp4rXI5YnLUgw",
"affinityToken": "ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=",
"state": "error",
"errorCode": "InvalidPassword",
"errorDetails": {
"in": "searchContext",
"at": "input.password"
},
"expirationDateTime": "2016-12-17T20:38:39.796Z"
}
Response when the source document required a password but the wrong password was provided
HTTP/1.1 200 OK
Content-Type: application/json
{
"input": {
"documentIdentifier": "your-own-unique-identifier-for-the-source-document",
"source": "workFile",
"fileId": "ek5Zb123oYHSUEVx1bUrVQ",
"password": "wrong-password"
},
"contextId": "ElkNzWtrUJp4rXI5YnLUgw",
"affinityToken": "ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=",
"state": "error",
"errorCode": "InvalidPassword",
"errorDetails": {
"in": "searchContext",
"at": "input.password"
},
"expirationDateTime": "2016-12-17T20:38:39.796Z"
}
GET /v2/searchContexts/{contextId}/records?pages={pages}
Gets full-page text data (records) for a specified set of pages.
Request
URL Parameters
Parameter | Description |
---|---|
{contextId} |
The contextId which identifies the resource. |
{pages} |
Required. A set of comma-delimited page indices (zero-indexed page numbers) and/or hyphenated page index ranges for which you want the full-page text data (records). See more below. |
pages
The pages parameter accepts one or more zero-indexed page numbers (page indices). Between commas, you can specify individual pages (like 0
), closed page ranges (like 0-3
), and open-ended page ranges (like 3-
, which means page index 3 through the end of the document).
Here are some examples:
Example | Description |
---|---|
pages=0 |
Get the text data for page index 0. |
pages=5 |
Get the text data for page index 5. |
pages=0-5 |
Get the text data for page indices 0-5. |
pages=3- |
Get the text data for page indices 3 through the end of the document. |
pages=0- |
Get the text data for all pages (page index 0 through the end of the document). |
pages=1- |
Get the text data for all but the first page (page index 1 through the end of the document). |
pages=0,2,5,9 |
Get the text data for page indices 0, 2, 5, and 9. |
pages=2,4-5,7- |
Get the text data for page indices 2, 4 through 5, and 7 through the end of the document. |
Request Headers
Name | Description |
---|---|
Accusoft-Affinity-Token |
The affinityToken of the search context. Required when server clustering is enabled. |
Successful Response
JSON containing full-page text records for the requested pages.
-
pages[]
(Array of Objects) Always present. Array of full-page text record objects for the requested pages. Note that the order of the records is not guaranteed; you must use thenumber
property of each returned item to know its page index. Items may contain:number
(Integer) Always present. Page index (zero-indexed page number). The property is named simplynumber
for backwards compatibility reasons.text
(String) Page text.errorCode
(String) A descriptive page-level error code (such as"CouldNotGetPageData"
) if there was a problem getting data for the page.width
(Number) Page width.height
(Number) Page height.-
rectangles[]
(Array of Arrays) Bounding boxes for individual glyphs on the page. Each item will contain four numbers:[0]
(Number) Distance from the left edge of the page to the left edge of the glyph bounding box.[1]
(Number) Distance from the top edge of the page to the top edge of the glyph bounding box.[2]
(Number) Width of the glyph bounding box.[3]
(Number) Height of the glyph bounding box.
-
markup[]
(Array of Objects) Objects describing hyperlinks, if any. Each item may contain:changeType
(String) Value will always be"Add"
.markType
(String) Value will always be"DocumentHyperlink"
.-
properties
(Object) Properties of the hyperlink.href
(String) Destination URL.-
rectangle
(Object) Dimensions of the hyperlink bounding box on the page.x
(Number) Distance from the left edge of the page to the left edge of the hyperlink bounding box.y
(Number) Distance from the top edge of the page to the top edge of the hyperlink bounding box.width
(Number) Width of the hyperlink bounding box.height
(Number) Height of the hyperlink bounding box.
borderThickness
(Number) Border thickness which should be applied.borderHorizontalRadius
(Number) Horizontal border radius which should be applied.borderVerticalRadius
(Number) Vertical border radius which should be applied.borderOpacity
(Integer) Border opacity which should be applied. Value will be from0
to255
, where0
represents fully transparent and255
represents fully opaque.
errorCode
(String) Descriptive error code. Present if there was a general problem getting all of the requested data.errorDetails
(Object) Present if there are additional error details.
Error Responses
Status Code | JSON errorCode | Description |
---|---|---|
404 |
No search context exists for the {contextId} given in the URL. It may have expired, or it may have never existed. |
|
400 |
"MissingInput" |
Can occur when clustering is enabled and an Accusoft-Affinity-Token request header was not provided. |
480 |
"MissingInput" |
A required input was missing. See the errorDetails for more information. |
480 |
"InvalidSyntax" |
Can occur when the pages query string parameter is set to a value we cannot understand. |
480 |
"ResourceNotUsable" |
Can occur when the search context is in a state of "error" . You may be able to get more information from a GET /v2/searchContexts/{contextId} . |
580 |
"InternalError" |
The server encountered an internal error when handling the request. |
Examples
When all data is returned successfully
Request records for pages 0 through 9:
GET /v2/searchContexts/ElkNzWtrUJp4rXI5YnLUgw/records?pages=0-9
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=
Successful response (where ...
indicates that data has been omitted for brevity):
HTTP/1.1 200 OK
Content-Type: application/json
{
"pages": [
{
"number": 0,
"text": "the page text",
"width": 648.00,
"height": 828.00,
"rectangles": [
[
202.25,
135.05,
27.00,
73.26
],
[
229.25,
135.05,
30.00,
73.26
],
...
]
"markup": [
{
"changeType": "Add",
"markType": "DocumentHyperlink",
"properties": {
"rectangle": {
"height": 14.71,
"width": 86.20,
"y": 73.50,
"x": 71.31
},
"borderHorizontalRadius": 0.0,
"borderVerticalRadius": 0.0,
"borderThickness": 0.0,
"href": "http://www.google.com/",
"borderOpacity": 255
}
},
...
]
},
...
]
}
When the data stream is interrupted
Because this URL may return large amounts of data, we progressively stream data to the HTTP response. As such, it is possible that we encounter a data streaming error after we have sent HTTP 200. When this happens, we will close the JSON with a top-level errorCode
of "DataStreamInterruption"
, like so:
HTTP/1.1 200 OK
Content-Type: application/json
{
"pages": [...],
"errorCode": "DataStreamInterruption"
}
When out-of-range, non-existent pages are requested
If you request a set of pages that include non-existent pages beyond the length of the document, we will include whatever actual pages
we can, but we will also add a top-level errorCode
of "RequestedPagesOutOfRange"
with the actual documentPageCount
within an errorDetails
object, like so:
GET /v2/searchContexts/ElkNzWtrUJp4rXI5YnLUgw/records?pages=0-9
Content-Type: application/json
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=
HTTP/1.1 200 OK
Content-Type: application/json
{
"pages": [...],
"errorCode": "RequestedPagesOutOfRange",
"errorDetails": {
"documentPageCount": 3
}
}
When data cannot be extracted from some pages
The pages
array will contain one item for each requested page that actually exists. If we are unable to obtain data for a particular page, we will include an item in the pages
array that contains the page number
and a page-specific errorCode
of "CouldNotGetPageData"
, like so:
HTTP/1.1 200 OK
Content-Type: application/json
{
"pages": [
{
"number": 0,
"text": "Once upon a time...",
"width": 612.00,
"height": 792.00,
"rectangles": [...]
},
{
"number": 1,
"errorCode": "CouldNotGetPageData"
},
{
"number": 2,
"errorCode": "CouldNotGetPageData"
},
{
"number": 3,
"text": "and then, she said to the dragon...",
"width": 612.00,
"height": 792.00,
"rectangles": [...]
}
]
}