PrizmDoc Viewer v13.23 - Updated August 4, 2023
API Reference / PrizmDoc Server REST API / Viewer Support / Form Extractors
Form Extractors

Introduction

The form extractors REST API is used by our e-signature viewer to automatically detect form field elements in a document being viewed.

A form extractor resource represents an asynchronous form extraction process. Each form extractor that is created is assigned a unique processId.

Available URLs

URL Description
GET /PCCIS/V1/ViewingSession/u{viewingSessionId}/FormInfo Returns what kind of form field data, if any, is available in a viewing session's source document.
POST /v2/formExtractors Creates a new form extractor for a work file, starting the process of extracting form field data.
GET /v2/formExtractors/{processId} Gets the status and final output of a form extractor.

Output Schemas

GET /PCCIS/V1/ViewingSession/u{viewingSessionId}/FormInfo

Returns what kind of form field data, if any, is available in a viewing session's source document.

Request

URL Parameters

Parameter Description
{viewingSessionId} The viewingSessionId which identifies the viewing session.

Successful Response

Response Body

JSON with information about what kind of form data, if any, is available in the source document of the viewing session.

  • formType[] (Array of strings) Array of values indicating what types of form data, if any, are available for extraction from this viewing session's source document. Values will be one of the following:
    • "acroform" - The source document is a PDF which contains AcroForm data. The data can be extracted by using an input.formType of "acroform" in a subsequent POST to create a form extractor process.
    • "xfa" - The source document is a PDF which contains XFA form data. We do not yet support extraction of XFA data.
    • "rasterForm" - The source document is a raster file which may or may not contain detectable form fields. You can attempt to extract form data by using an input.formType of "rasterForm" in a subsequent POST to create a form extractor process.

Error Responses

Status Code JSON errorCode Description
404 No viewing session with the provided {viewingSessionId} could be found.
480 "DocumentNotProvidedYet" A source document has not been provided to the viewing session.
480 "FeatureNotLicensed" You are not licensed to use the form extraction feature.
580 "InternalError" The server encountered an internal error when handling the request.

Example

Request

GET /PCCIS/V1/ViewingSession/uDLbVh9sTmXJAmd1GeXbS9Gn3WHxs8oib2xPsW2xEFjnIDdoJcudPtxciodSYFQq6zYGabQ_rJIecdbkImTTkSA/FormInfo

Response

HTTP/1.1 200 OK
Content-Type: application/json

{
  "formType": ["acroform"]
}

POST /v2/formExtractors

Creates a new form extractor for a work file, starting the process of extracting form field data.

Request

Request Headers

Name Description
Content-Type Must be application/json
Accusoft-Affinity-Token The affinityToken of the work file specified by input.fileId. Required when server clustering is enabled.

Request Body

  • input
    • fileId (String) Required. The id of the work file to extract form field data from.
    • password (String) Password to open the source document, if required.
    • formType (String) Required. Type of form field data to extract. Must be one of the following:
      • "acroform" - Extract AcroForm field data from a PDF and return results in our "acroform" JSON format.
      • "rasterForm" - Detect visible form fields in a raster document and return results in our "rasterForm" JSON format.
  • minSecondsAvailable (Integer) The minimum number of seconds this process will remain available to GET its status. The actual lifetime may be longer. The default lifetime is defined by the processIds.lifetime central configuration parameter.

Successful Response

Response Body

JSON with metadata about the created form extractor process. You can check on the status of the form extraction process with additional GET requests.

  • input (Object) Input we accepted to create the form extractor process.
  • processId (String) Unique id for the newly-created form extractor process.
  • affinityToken (String) Affinity token for this form extractor. Present when clustering is enabled.
  • state (String) State of extracting form field data:
    • "processing" - The server is extracting form field data.
    • "complete" - All form field data has been extracted.
    • "error" - There was a problem extracting form field data.
  • percentComplete (Integer) Percentage of form extraction which has completed (from 0 to 100).
  • expirationDateTime (String) Currently planned date and time when the form extractor resource will expire and no longer be available. This time may be extended if we have need to keep using the data. Format is [RFC 3339 Internet Date/Time profile of ISO 8601], e.g. "2016-11-05T08:15:30.494Z".
  • errorCode (String) Descriptive error code. Present when state is "error".
  • errorDetails (Object) Additional error details, if any. May be present when errorCode is present.

Error Responses

Status Code JSON errorCode Description
400 "MissingInput" Can occur when clustering is enabled and an Accusoft-Affinity-Token request header was not provided.
480 "MissingInput" A required input value was not provided. See errorDetails in the response body.
480 "InvalidInput" An invalid input value was used. See errorDetails in the response body.
480 "FeatureNotLicensed" You are not licensed to use the form extraction feature.
580 "InternalError" The server encountered an internal error when handling the request.

Example

Request

POST /v2/formExtractors
Content-Type: application/json
Accusoft-Affinity-Token: ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM=

{
  "input": {
    "fileId": "ek5Zb123oYHSUEVx1bUrVQ",
    "formType": "acroform"
  }
}

Response

HTTP/1.1 200 OK
Content-Type: application/json

{
  "input": {
    "fileId": "ek5Zb123oYHSUEVx1bUrVQ",
    "formType": "acroform"
  },
  "processId": "ElkNzWtrUJp4rXI5YnLUgw",
  "state": "processing",
  "percentComplete": 0,
  "expirationDateTime": "2016-12-17T20:38:39.796Z",
  "affinityToken": "ejN9/kXEYOuken4Pb9ic9hqJK45XIad9LQNgCgQ+BkM="
}

GET /v2/formExtractors/{processId}

Gets the status and final output of a form extractor.

Request

URL Parameters

Parameter Description
{processId} The processId which identifies the form extractor process.

Request Headers

Name Description
Accusoft-Affinity-Token The affinityToken of the form extraction process. Required when server clustering is enabled.

Successful Response

Response Body

JSON with metadata about the form extractor process and the final output, if available. You can check on the status of the form extraction process with additional GET requests.

  • input (Object) Input we accepted to create the form extraction process.
  • processId (String) Unique id for this form extractor process.
  • affinityToken (String) Affinity token for this form extractor. Present when clustering is enabled.
  • state (String) State of extracting form field data:
    • "processing" - The server is extracting form field data.
    • "complete" - All form field data has been extracted.
    • "error" - There was a problem extracting form field data.
  • percentComplete (Integer) Percentage of form extraction which has completed (from 0 to 100).
  • expirationDateTime (String) Currently planned date and time when the form extractor resource will expire and no longer be available. This time may be extended if we have need to keep using the data. Format is [RFC 3339 Internet Date/Time profile of ISO 8601], e.g. "2016-11-05T08:15:30.494Z".
  • errorCode (String) Descriptive error code. Present when state is "error".
  • errorDetails (Object) Additional error details, if any. May be present when errorCode is present.
  • output (Object) Present when state is "complete":

Error Responses

Status Code JSON errorCode Description
400 "MissingInput" Can occur when clustering is enabled and an Accusoft-Affinity-Token request header was not provided.
404 No form extractor could be found for the given {processId}.
580 "InternalError" The server encountered an internal error when handling the request.

Example

Request

GET /v2/formExtractors/gLoltqCVnRKzXz2QFNptqw
Accusoft-Affinity-Token: D+Rmn9kB4FrLfrHoNL2bag6WpuNn2ox2qhT2GbLdf9A=

Response

HTTP/1.1 200 OK
Content-Type: application/json

{
  "input": {
    "fileId": "-eo_zmq3qmPS0WKZlP_Lug",
    "formType": "acroform"
  },
  "output": {
    "acroform": {
      "pages": [
        {
          "page": 1,
          "height": 792,
          "width": 612,
          "fields": [
            {
              "fieldType": "Text",
              "name": "email",
              "required": true,
              "readOnly": "true",
              "tabOrder": 0,
              "appearance": {
                "textColor": "0 g",
                "font": "Helvetica"
              },
              "boundingBox": {
                "lowerLeftX": 89,
                "lowerLeftY": 646,
                "upperRightX": 239,
                "upperRightY": 668
              },
              "options": {
                "multiline": false,
                "maxLen": -1
              },
              "format": {
                "formatCategory": "None"
              }
            },
            {
              "fieldType": "Text",
              "name": "fullName",
              "required": false,
              "readOnly": "false",
              "tabOrder": 1,
              "appearance": {
                "textColor": "0 g",
                "font": "Helvetica"
              },
              "boundingBox": {
                "lowerLeftX": 89,
                "lowerLeftY": 676,
                "upperRightX": 239,
                "upperRightY": 698
              },
              "options": {
                "multiline": false,
                "maxLen": -1
              },
              "format": {
                "formatCategory": "None"
              }
            }
          ]
        }
      ]
    }
  },
  "expirationDateTime": "2016-10-11T03:30:33.166Z",
  "percentComplete": 100,
  "processId": "gLoltqCVnRKzXz2QFNptqw",
  "state": "complete",
  "affinityToken": "D+Rmn9kB4FrLfrHoNL2bag6WpuNn2ox2qhT2GbLdf9A="
}

"acroform" Output

The output.acroform object will conform to the following. All properties are always present unless otherwise noted:

  • pages[] (Array of Objects) Pages in the document which contains acroform fields. Array will be empty if document does not contain any acroform fields. Each item will contain:
    • page (Integer) One-indexed page number.
    • height (Number) Page height in points.
    • width (Number) Page width in points.
    • fields[] (Array of Objects) Acroform fields in the current page. Items may contain:
      • fieldType (String) Field type. Will be one of the following:
        • "Text" - Text field
        • "Button" - Push button, check box, or radio button:
          • push button when options.pushButton is true
          • check box when options.pushButton and options.radio are both false
          • radio button when options.radio is true
        • "Signature" - Signature field
      • name (String) Unique field or radio button group name.
      • required (Boolean) Indicates whether or not this field is required for the form to be considered complete.
      • readOnly (Boolean) Indicates whether or not this field is read only inside the form.
      • tabOrder (Integer) Tab order of the field within the document.
      • boundingBox (Object) Position and size of this field. Object will contain:
        • lowerLeftX (Number) Distance in points from the left edge of the page to the left side of this field.
        • lowerLeftY (Number) Distance in points from the bottom edge of the page to the bottom edge of this field.
        • upperRightX (Number) Distance in points from the left edge of the page to the right edge of this field.
        • upperRightY (Number) Distance in points from the bottom edge of the page to the top edge of this field.
      • appearance (Object) Field appearance details:
        • textColor (String) Text fill color. Not always present.
        • font (String) Font name to use for this field. Not always present.
      • format (Object) Field formatting details:
        • formatCategory (String) Will be one of the following:
          • "None" - Indicates there are no additional formatOptions for this field.
          • "Date" - For text fields, requires the field value to be a date.
        • formatOptions Additional options for the given formatCategory, if any:
          • When formatCategory is "Date": (String) Date format string to use when formatting the date value for display.
      • options (Object) Additional field options, present for some field types:
        • When fieldType is "Text":
          • multiline (Boolean) Indicates whether or not this is a multi-line text field.
          • maxLen (Integer) Indicates the maximum number of characters this form field accepts, or -1 if there is no limit.
        • When fieldType is "Button":
          • pushButton (Boolean) true if this field is a push button, false otherwise.
          • radio (Boolean) true if this field is a radio button, false otherwise.
          • When both pushButton and radio are false, this field is a check box.
        • When fieldType is "Button" and pushButton is false:
          • buttonOnValue (String) Indicates the form value to use when this radio button or checkbox is selected/checked.
          • buttonOffValue (String) Indicates the form value to use when this radio button or checkbox is not selected/checked. Value will always be "Off".
          • buttonValue (String) Indicates whether or not this radio button or checkbox should be initially selected/checked. When the value matches buttonOnValue, then this radio button or checkbox should be initially selected/checked. Otherwise (when the value is "Off"), this radio button or checkbox should not be initially selected/checked.

Fill Color Strings

A string of one or more numbers followed by an operator indicating what the numbers represent:

  • Grayscale value (when string ends in "g"): A single number between 0 and 1 followed by "g" represents the amount of white which forms a grayscale color value. For example:
    • "0 g" - black
    • "0.5 g" - 50% gray
    • "1 g" - white
  • RGB value (when string ends in "rg"): Three numbers between 0 and 1 followed by "rg" represent the the amount of red, green, and blue light which are additively mixed to form the final color. For example:
    • "1 0 0 rg" - red
    • "1 1 0 rg" - yellow
    • "0.5 0.25 0.75 rg" - 50% red, 25% blue, 75% green
  • CMYK (when string ends in "k"): Four numbers between 0 and 1 followed by "k" represent the amount of cyan, magenta, yellow, and black which should be subtractively mixed to form the final color. For example:
    • "0 0 0 1 k" - black
    • "1 1 1 0 k" - black
    • "1 1 1 1 k" - black
    • "1 0 0 0 k" - cyan
    • "0.25 0.88 0.2 0.16 k" - 25% cyan, 88% magenta, 20% yellow, 16% black

Date Format Strings

Date format strings use the following special substitution patterns:

  • yy - 2-digit year (e.g. 16 for the year 2016)
  • yyyy - 4-digit year (e.g. 2016)
  • m - Month number with no zero padding (e.g. 7 for July)
  • mm - Month number zero-padded to always be two characters long (e.g. 07 for July)
  • mmm - Abbreviated month name (e.g. Jan)
  • mmmm - Full month name (e.g. January)
  • d - Day of the month with no zero padding (e.g. 4 for the fourth day of the month)
  • dd - Day of the month zero-padded to always be two characters (e.g. 04 for the fourth day of the month)
  • ddd - Abbreviated day of the week (e.g. Sun)
  • dddd - Full name for the day of the week (e.g. Sunday)
  • h - Hour number in 12-hour time with no zero padding (e.g. 2 for 2 o'clock)
  • hh - Hour number in 12-hour time zero-padded to always be two characters (e.g. 02 for 2 o'clock)
  • H - Hour number in 24-hour time with no zero padding (e.g. 13 for the 1:00 pm hour)
  • HH - Hour number in 24-hour time zero-padded to always be two characters (e.g. 02 for the 2:00 am hour)
  • M - Minute without zero padding
  • MM - Minute, zero-padded to always be two digits
  • s - Second without zero-padding
  • ss - Second, zero-padded to always be two digits
  • z - Offset from UTC (e.g. -0400)
  • j - Abbreviated Japanese era and year (e.g. H28 for the year 2016).
  • jj - Full Japanese era and year (e.g. 平成28 for the year 2016).
  • jjj - Japanese era year without specifying the era (e.g. 28 for the year 2016).

All other characters are considered literal punctuation for the format string. The special characters used above may be used literally by escaping them with a backslash.

"rasterForm" Output

The output.rasterForm object will conform to the following. All properties are always present unless otherwise noted:

  • pages[] (Array of Objects) Information about each page in the raster document. Each item will contain:
    • page (Integer) One-indexed page number.
    • height (Number) Page height in pixels.
    • width (Number) Page width in pixels.
    • fields[] (Array of Objects) Fields detected in the current page. Array will be empty if no fields were detected. Items will contain:
      • name (String) Unique name we have automatically assigned to this field in the document (e.g. "field5").
      • fieldType (String) Field type. Will be one of the following:
        • "Text" - Text field
        • "CheckBox" - Check box
      • confidence (Number) Our confidence in the correct detection of this field using a scale of 0 (no confidence) to 100 (complete confidence).
      • boundingBox (Object) Position and size of this field. Object will contain:
        • x (Number) Distance in pixels from the left edge of the page to the left side of this field.
        • y (Number) Distance in pixels from the top edge of the page to the top edge of this field.
        • width (Number) Distance in pixels from the left edge of this field (x) to the right edge of this field.
        • height (Number) Distance in pixels from the top edge of this field (y) to the bottom edge of this field.
    • tables[] (Array of Objects) Tables detected in the current page. Array will be empty if no tables were detected. Items will contain:
      • numOfColumns (Integer) Number of columns in the detected table.
      • numOfRows (Integer) Number of rows in the detected table.
      • fields[] (Array of Objects) Fields detected in the current table. Items will contain:
        • name (String) Unique name we have automatically assigned to this field in the document (e.g. "field5").
        • fieldType (String) Field type. Will be one of the following:
          • "Text" - Text field
          • "CheckBox" - Check box
        • confidence (Number) Our confidence in the correct detection of this field using a scale of 0 (no confidence) to 100 (complete confidence).
        • boundingBox (Object) Position and size of this field. Object will contain:
          • x (Number) Distance in pixels from the left edge of the page to the left side of this field.
          • y (Number) Distance in pixels from the top edge of the page to the top edge of this field.
          • width (Number) Distance in pixels from the left edge of this field (x) to the right edge of this field.
          • height (Number) Distance in pixels from the top edge of this field (y) to the bottom edge of this field.