PrizmDoc Viewer v13.19 - Updated
API Reference / PAS REST API / Viewer Support / Form Extractors
In This Topic
    Form Extractors
    In This Topic

    Introduction

    The form extractors REST API is used by our e-signature viewer to automatically detect form field elements in a document being viewed.

    A form extractor resource represents an asynchronous form extraction process. Each form extractor that is created is assigned a unique processId.

    Available URLs

    URL Description
    GET /ViewingSession/u{viewingSessionId}/FormInfo Returns what kind of form field data, if any, is available in a viewing session's source document.
    POST /v2/viewingSessions/{viewingSessionId}/formExtractors Creates a new form extractor from the source document of a viewing session, starting the process of extracting form field data.
    GET /v2/viewingSessions/{viewingSessionId}/formExtractors/{processId} Gets the status and final output of a form extractor created for a specified viewing session.

    Output Schemas

    GET /ViewingSession/u{viewingSessionId}/FormInfo

    Returns what kind of form field data, if any, is available in a viewing session's source document.

    Request

    URL Parameters

    Parameter Description
    {viewingSessionId} The viewingSessionId which identifies the viewing session. Note this particular URL requires a letter 'u' to be provided before the viewingSessionId.

    Successful Response

    Response Body

    JSON with information about what kind of form data, if any, is available in the source document of the viewing session.

    • formType[] (Array of strings) Array of values indicating what types of form data, if any, are available for extraction from this viewing session's source document. Values will be one of the following:
      • "acroform" - The source document is a PDF which contains AcroForm data. The data can be extracted by using an input.formType of "acroform" in a subsequent POST to create a form extractor process.
      • "xfa" - The source document is a PDF which contains XFA form data. We do not yet support extraction of XFA data.
      • "rasterForm" - The source document is a raster file which may or may not contain detectable form fields. You can attempt to extract form data by using an input.formType of "rasterForm" in a subsequent POST to create a form extractor process.

    Error Responses

    Status Code JSON errorCode Description
    404 No viewing session with the provided {viewingSessionId} could be found.
    480 "DocumentNotProvidedYet" A source document has not been provided to the viewing session.
    480 "FeatureNotLicensed" You are not licensed to use the form extraction feature.
    501 "NotImplemented" Form extraction is not yet implemented for a viewing session which uses a cached viewing package.
    580 "InternalError" The server encountered an internal error when handling the request.

    Example

    Request

    GET pas_base_url/ViewingSession/uXYZ.../FormInfo
    

    Response

    HTTP/1.1 200 OK
    Content-Type: application/json
    
    {
      "formType": ["acroform"]
    }
    
    

    POST /v2/viewingSessions/{viewingSessionId}/formExtractors

    Creates a new form extractor from the source document of a viewing session, starting the process of extracting form field data.

    Request

    Request Headers

    Name Description
    Content-Type Must be application/json

    Request Body

    • input
      • password (String) Password to open the source document, if required.
      • formType (String) Required. Type of form field data to extract. Must be one of the following:
        • "acroform" - Extract AcroForm field data from a PDF and return results in our "acroform" JSON format.
        • "rasterForm" - Detect visible form fields in a raster document and return results in our "rasterForm" JSON format.
    • minSecondsAvailable (Integer) The minimum number of seconds this process will remain available to GET its status. The actual lifetime may be longer.

    Successful Response

    Response Body

    JSON with metadata about the created form extractor process. You can check on the status of the form extraction process with additional GET requests.

    • input (Object) Input we accepted to create the form extractor process.
    • processId (String) Unique id for the newly-created form extractor process.
    • state (String) State of extracting form field data:
      • "processing" - The server is extracting form field data.
      • "complete" - All form field data has been extracted.
      • "error" - There was a problem extracting form field data.
    • percentComplete (Integer) Percentage of form extraction which has completed (from 0 to 100).
    • expirationDateTime (String) Currently planned date and time when the form extractor resource will expire and no longer be available. This time may be extended if we have need to keep using the data. Format is [RFC 3339 Internet Date/Time profile of ISO 8601], e.g. "2016-11-05T08:15:30.494Z".
    • errorCode (String) Descriptive error code. Present when state is "error".
    • errorDetails (Object) Additional error details, if any. May be present when errorCode is present.

    Error Responses

    Status Code JSON errorCode Description
    480 "MissingInput" A required input value was not provided. See errorDetails in the response body.
    480 "InvalidInput" An invalid input value was used. See errorDetails in the response body.
    480 "DocumentNotProvidedYet" A source document has not been provided to the viewing session.
    480 "FeatureNotLicensed" You are not licensed to use the form extraction feature.
    480 "LicenseCouldNotBeVerified" The server's license could not be verified. If you are evaluating the product without a license, the product is running in evaluation mode and this particular part of the product is unavailable without a license. If you have a license, make sure you configured your license correctly, that your license has not expired, and that you have not exceeded any license limits (such as, for a Cloud License, the total number of logical CPU cores in use).
    501 "NotImplemented" Form extraction is not yet implemented for a viewing session which uses a cached viewing package.
    580 "InternalError" The server encountered an internal error when handling the request.

    Example

    Request

    POST pas_base_url/v2/viewingSessions/uXYZ.../formExtractors
    Content-Type: application/json
    
    {
      "input": {
        "formType": "acroform"
      }
    }
    

    Response

    HTTP/1.1 200 OK
    Content-Type: application/json
    
    {
      "input": {
        "formType": "acroform"
      },
      "processId": "ElkNzWtrUJp4rXI5YnLUgw",
      "state": "processing",
      "percentComplete": 0,
      "expirationDateTime": "2016-12-17T20:38:39.796Z"
    }
    
    

    GET /v2/viewingSessions/{viewingSessionId}/formExtractors/{processId}

    Gets the status and final output of a form extractor created for a specified viewing session.

    Request

    URL Parameters

    Parameter Description
    {viewingSessionId} The viewingSessionId which identifies the viewing session.
    {processId} The processId which identifies the form extractor process.

    Successful Response

    Response Body

    JSON with metadata about the form extractor process and the final output, if available. You can check on the status of the form extraction process with additional GET requests.

    • input (Object) Input we accepted to create the form extraction process.
    • processId (String) Unique id for this form extractor process.
    • state (String) State of extracting form field data:
      • "processing" - The server is extracting form field data.
      • "complete" - All form field data has been extracted.
      • "error" - There was a problem extracting form field data.
    • percentComplete (Integer) Percentage of form extraction which has completed (from 0 to 100).
    • expirationDateTime (String) Currently planned date and time when the form extractor resource will expire and no longer be available. This time may be extended if we have need to keep using the data. Format is [RFC 3339 Internet Date/Time profile of ISO 8601], e.g. "2016-11-05T08:15:30.494Z".
    • errorCode (String) Descriptive error code. Present when state is "error".
    • errorDetails (Object) Additional error details, if any. May be present when errorCode is present.
    • output (Object) Present when state is "complete":

    Error Responses

    Status Code JSON errorCode Description
    404 - No form extractor could be found for the given {viewingSessionId} and {processId}.
    501 "NotImplemented" Form extraction is not yet implemented for a viewing session which uses a cached viewing package.
    580 "InternalError" The server encountered an internal error when handling the request.

    Examples

    Request

    GET pas_base_url/v2/viewingSessions/uXYZ.../formExtractors/x62gH3TYdqlKj94pLqzmtS
    

    Response

    HTTP/1.1 200 OK
    Content-Type: application/json
    
    {
      "input": {
        "formType": "acroform"
      },
      "output": {
        "acroform": {
          "pages": [
            {
              "page": 1,
              "height": 792,
              "width": 612,
              "fields": [
                {
                  "fieldType": "Text",
                  "name": "email",
                  "required": true,
                  "tabOrder": 0,
                  "appearance": {
                    "textColor": "0 g",
                    "font": "Helvetica"
                  },
                  "boundingBox": {
                    "lowerLeftX": 89,
                    "lowerLeftY": 646,
                    "upperRightX": 239,
                    "upperRightY": 668
                  },
                  "options": {
                    "multiline": false,
                    "maxLen": -1
                  },
                  "format": {
                    "formatCategory": "None"
                  }
                },
                {
                  "fieldType": "Text",
                  "name": "fullName",
                  "required": false,
                  "tabOrder": 1,
                  "appearance": {
                    "textColor": "0 g",
                    "font": "Helvetica"
                  },
                  "boundingBox": {
                    "lowerLeftX": 89,
                    "lowerLeftY": 676,
                    "upperRightX": 239,
                    "upperRightY": 698
                  },
                  "options": {
                    "multiline": false,
                    "maxLen": -1
                  },
                  "format": {
                    "formatCategory": "None"
                  }
                }
              ]
            }
          ]
        }
      },
      "expirationDateTime": "2016-10-11T03:30:33.166Z",
      "percentComplete": 100,
      "processId": "x62gH3TYdqlKj94pLqzmtS",
      "state": "complete"
    }
    
    

    "acroform" Output

    The output.acroform object will conform to the following. All properties are always present unless otherwise noted:

    • pages[] (Array of Objects) Pages in the document which contains acroform fields. Array will be empty if document does not contain any acroform fields. Each item will contain:
      • page (Integer) One-indexed page number.
      • height (Number) Page height in points.
      • width (Number) Page width in points.
      • fields[] (Array of Objects) Acroform fields in the current page. Items may contain:
        • fieldType (String) Field type. Will be one of the following:
          • "Text" - Text field
          • "Button" - Push button, check box, or radio button:
            • push button when options.pushButton is true
            • check box when options.pushButton and options.radio are both false
            • radio button when options.radio is true
          • "Signature" - Signature field
        • name (String) Unique field or radio button group name.
        • required (Boolean) Indicates whether or not this field is required for the form to be considered complete.
        • tabOrder (Integer) Tab order of the field within the document.
        • boundingBox (Object) Position and size of this field. Object will contain:
          • lowerLeftX (Number) Distance in points from the left edge of the page to the left side of this field.
          • lowerLeftY (Number) Distance in points from the bottom edge of the page to the bottom edge of this field.
          • upperRightX (Number) Distance in points from the left edge of the page to the right edge of this field.
          • upperRightY (Number) Distance in points from the bottom edge of the page to the top edge of this field.
        • appearance (Object) Field appearance details:
          • textColor (String) Text fill color. Not always present.
          • font (String) Font name to use for this field. Not always present.
        • format (Object) Field formatting details:
          • formatCategory (String) Will be one of the following:
            • "None" - Indicates there are no additional formatOptions for this field.
            • "Date" - For text fields, requires the field value to be a date.
          • formatOptions Additional options for the given formatCategory, if any:
            • When formatCategory is "Date": (String) Date format string to use when formatting the date value for display.
        • options (Object) Additional field options, present for some field types:
          • When fieldType is "Text":
            • multiline (Boolean) Indicates whether or not this is a multi-line text field.
            • maxLen (Integer) Indicates the maximum number of characters this form field accepts, or -1 if there is no limit.
          • When fieldType is "Button":
            • pushButton (Boolean) true if this field is a push button, false otherwise.
            • radio (Boolean) true if this field is a radio button, false otherwise.
            • When both pushButton and radio are false, this field is a check box.
          • When fieldType is "Button" and options.radio is true:
            • buttonOnValue (String) Indicates the form value to use when this radio button is selected.
            • buttonOffValue (String) Indicates the form value to use when this radio button is not selected. Value will always be "Off".
            • buttonValue (String) Indicates whether or not this radio button should be initially selected. When the value matches buttonOnValue, then this radio button should be initially selected. Otherwise (when the value is "Off"), this radio button should not be initially selected.

    Fill Color Strings

    A string of one or more numbers followed by an operator indicating what the numbers represent:

    • Grayscale value (when string ends in "g"): A single number between 0 and 1 followed by "g" represents the amount of white which forms a grayscale color value. For example:
      • "0 g" - black
      • "0.5 g" - 50% gray
      • "1 g" - white
    • RGB value (when string ends in "rg"): Three numbers between 0 and 1 followed by "rg" represent the the amount of red, green, and blue light which are additively mixed to form the final color. For example:
      • "1 0 0 rg" - red
      • "1 1 0 rg" - yellow
      • "0.5 0.25 0.75 rg" - 50% red, 25% blue, 75% green
    • CMYK (when string ends in "k"): Four numbers between 0 and 1 followed by "k" represent the amount of cyan, magenta, yellow, and black which should be subtractively mixed to form the final color. For example:
      • "0 0 0 1 k" - black
      • "1 1 1 0 k" - black
      • "1 1 1 1 k" - black
      • "1 0 0 0 k" - cyan
      • "0.25 0.88 0.2 0.16 k" - 25% cyan, 88% magenta, 20% yellow, 16% black

    Date Format Strings

    Date format strings use the following special substitution patterns:

    • yy - 2-digit year (e.g. 16 for the year 2016)
    • yyyy - 4-digit year (e.g. 2016)
    • m - Month number with no zero padding (e.g. 7 for July)
    • mm - Month number zero-padded to always be two characters long (e.g. 07 for July)
    • mmm - Abbreviated month name (e.g. Jan)
    • mmmm - Full month name (e.g. January)
    • d - Day of the month with no zero padding (e.g. 4 for the fourth day of the month)
    • dd - Day of the month zero-padded to always be two characters (e.g. 04 for the fourth day of the month)
    • ddd - Abbreviated day of the week (e.g. Sun)
    • dddd - Full name for the day of the week (e.g. Sunday)
    • h - Hour number in 12-hour time with no zero padding (e.g. 2 for 2 o'clock)
    • hh - Hour number in 12-hour time zero-padded to always be two characters (e.g. 02 for 2 o'clock)
    • H - Hour number in 24-hour time with no zero padding (e.g. 13 for the 1:00 pm hour)
    • HH - Hour number in 24-hour time zero-padded to always be two characters (e.g. 02 for the 2:00 am hour)
    • M - Minute without zero padding
    • MM - Minute, zero-padded to always be two digits
    • s - Second without zero-padding
    • ss - Second, zero-padded to always be two digits
    • z - Offset from UTC (e.g. -0400)
    • j - Abbreviated Japanese era and year (e.g. H28 for the year 2016).
    • jj - Full Japanese era and year (e.g. 平成28 for the year 2016).
    • jjj - Japanese era year without specifying the era (e.g. 28 for the year 2016).

    All other characters are considered literal punctuation for the format string. The special characters used above may be used literally by escaping them with a backslash.

    "rasterForm" Output

    The output.rasterForm object will conform to the following. All properties are always present unless otherwise noted:

    • pages[] (Array of Objects) Information about each page in the raster document. Each item will contain:
      • page (Integer) One-indexed page number.
      • height (Number) Page height in pixels.
      • width (Number) Page width in pixels.
      • fields[] (Array of Objects) Fields detected in the current page. Array will be empty if no fields were detected. Items will contain:
        • name (String) Unique name we have automatically assigned to this field in the document (e.g. "field5").
        • fieldType (String) Field type. Will be one of the following:
          • "Text" - Text field
          • "CheckBox" - Check box
        • confidence (Number) Our confidence in the correct detection of this field using a scale of 0 (no confidence) to 100 (complete confidence).
        • boundingBox (Object) Position and size of this field. Object will contain:
          • x (Number) Distance in pixels from the left edge of the page to the left side of this field.
          • y (Number) Distance in pixels from the top edge of the page to the top edge of this field.
          • width (Number) Distance in pixels from the left edge of this field (x) to the right edge of this field.
          • height (Number) Distance in pixels from the top edge of this field (y) to the bottom edge of this field.
      • tables[] (Array of Objects) Tables detected in the current page. Array will be empty if no tables were detected. Items will contain:
        • numOfColumns (Integer) Number of columns in the detected table.
        • numOfRows (Integer) Number of rows in the detected table.
        • fields[] (Array of Objects) Fields detected in the current table. Items will contain:
          • name (String) Unique name we have automatically assigned to this field in the document (e.g. "field5").
          • fieldType (String) Field type. Will be one of the following:
            • "Text" - Text field
            • "CheckBox" - Check box
          • confidence (Number) Our confidence in the correct detection of this field using a scale of 0 (no confidence) to 100 (complete confidence).
          • boundingBox (Object) Position and size of this field. Object will contain:
            • x (Number) Distance in pixels from the left edge of the page to the left side of this field.
            • y (Number) Distance in pixels from the top edge of the page to the top edge of this field.
            • width (Number) Distance in pixels from the left edge of this field (x) to the right edge of this field.
            • height (Number) Distance in pixels from the top edge of this field (y) to the bottom edge of this field.