PrizmDoc Viewer v13.19 - Updated May 22, 2023
Developer Guide / PrizmDoc Server / How To Examples / Perform Auto-Redaction
Perform Auto-Redaction

Introduction

An auto-redaction is a multi-step process that finds matches of a given regular expression, then permanently removes the text and blacks out the displayed region for each match. The end result is a new PDF document with no traces of the text that matched the regular expression.

For application development in .NET, we recommend using the PrizmDoc Server .NET SDK instead of using the PrizmDoc Server REST API directly. See the How to Create a Redacted PDF topic in the .NET SDK documentation for an example of how to easily perform auto-redaction with the .NET SDK.

The following steps walk you through using the PrizmDoc Server REST API to perform auto-redaction.

Step 1: Upload Your Source Document

  • Upload the source document that you want to redact.
  • This can be a document of any format supported by the PrizmDoc Server RESTful API, except for DICOM and CAD documents, which are not currently supported for redaction.
  • In response to this request, you will receive a file ID that is used to reference the source document in later requests.

    Example

     POST /PCCIS/V1/WorkFile?FileExtension=pdf
     Content-Type: application/octet-stream
     [binary data]
    
      200 OK
      Content-Type: application/json
      {
          "fileId": "5qTYa3gzN9gYUb5SzqUhqg",
      }
    
    

Step 2: Compose a Regular Expression

  • Compose the regular expression that will match the text you want to redact in the document.
  • The regular expression should adhere to the POSIX extended RE (ERE) or basic RE (BRE) syntax. (See details in this link: http://laurikari.net/tre/documentation/regex-syntax/) > NOTE: Undocumented regex features may work, however we don't provide support for them.
  • For example, the following regular expression will redact all US Social Security Numbers in a document:

    Example

     "[0-9]{3}[-]?[0-9]{2}[-]?[0-9]{4}"
    
    

    NOTE: The regular expression is sent to PrizmDoc in JSON format, so you should adjust the regular expression according to JSON syntax. Specifically, the backslash symbol should be duplicated. If you create regular expressions programmatically, using string literals, you may need to further adjust the string according to the programming language syntax.

    Example

     Regular expression (searches whole word "the", case insensitive):
     "(?i:\bthe\b)"
    
     JSON content:
     "regex": "(?i:\\bthe\\b)";
    
     C# code:
     string regex = "(?i:\\\\bthe\\\\b)";
    
    

Step 3: Create Markup JSON from the Regular Expression

Before the actual redaction process can be started, the regular expression needs to be converted to a format it can understand. PrizmDoc uses a proprietary XML syntax to define markups used for redaction, which you can generate by sending a POST request that requires two inputs:

  • The file ID of source document you uploaded in Step 1.
  • One or more rules to match and redact the document text:

    • Each rule includes a regular expression such as the one you created in Step 2, and
    • An object that describes how to redact the matching text.

      Example

      POST /v2/redactionCreators
      Content-Type: application/json
      {
        "input": {
            "source": {
                "fileId": "5qTYa3gzN9gYUb5SzqUhqg"
            },
            "rules": [{
                "find": {
                    "type": "regex",
                    "pattern": "[0-9]{3}[-]?[0-9]{2}[-]?[0-9]{4}"
                },
                "redactWith": {
                    "type": "RectangleRedaction",
                    "reason": "Redacted"
                }
            }]
        }
      }
      
      200 OK
      Content-Type: application/json
      {
        "processId": "Rr64ma-U_HseoPrs6y0iiw",
        "expirationDateTime": "2014-12-03T18:30:49.460Z",
        "input": {
            "source": {
                "fileId": "5qTYa3gzN9gYUb5SzqUhqg"
            },
            "rules": [
                {
                    "find": {
                        "type": "regex",
                        "pattern": "[0-9]{3}[-]?[0-9]{2}[-]?[0-9]{4}"
                    },
                    "redactWith": {
                        "type": "RectangleRedaction",
                        "reason": "Redacted"
                    }
                }
            ]
        },
        "state": "processing",
        "percentComplete": 0
      }
      
      

Step 4: Check Status of the RedactionCreator Resource

  • The process to generate markup XML runs asynchronously on the PrizmDoc server. The POST request you sent in Step 3 will return immediately and before the output is ready. This means you will need to check the status of the process by sending a GET request to the resource you just created.
  • In response to this request, JSON will be returned that includes a state property. When this property is complete, the JSON response will also include an output property, which means you can proceed to the next step.
  • See the Redaction Creator API for more details about this request.

    Example

     GET /v2/redactionCreators/Rr64ma-U_HseoPrs6y0iiw
    
     200 OK
     Content-Type: application/json
     {
         "processId": "Rr64ma-U_HseoPrs6y0iiw",
         "expirationDateTime": "2014-12-03T18:30:49.460Z",
         "input": {
             "source": {
                 "fileId": "5qTYa3gzN9gYUb5SzqUhqg"
             },
             "rules": [
                 {
                     "find": {
                         "type": "regex",
                         "pattern": "[0-9]{3}[-]?[0-9]{2}[-]?[0-9]{4}"
                     },
                     "redactWith": {
                         "type": "RectangleRedaction",
                         "reason": "Redacted"
                     }
                 }
             ]
         },
         "state": "complete",
         "percentComplete": 100,
         "output": {
             "markupFileId": "o1bLJwFGxf9QGuTkyrOqig"
         }
     }
    
    

Step 5: Start the Markup Burning Process (Redaction)

  • Using the file IDs you obtained for the source document in Step 1 and the XML markup file in Step 4, you can now start the process to redact the document by sending a POST request, which will start a process that runs asynchronously on the PrizmDoc server to produce a redacted document.

    Example

     POST /PCCIS/V1/MarkupBurner
     Content-Type: application/json
     {
         "input": {
             "documentFileId": "5qTYa3gzN9gYUb5SzqUhqg",
             "markupFileId": "o1bLJwFGxf9QGuTkyrOqig"
         }
     }
    
     200 OK
     Content-Type: application/json
     {
         "processId": "bQpcuixhvGmNqn5ElskO6Q",
         "expirationDateTime": "2014-12-03T18:30:49.460Z",
         "input": {
             "documentFileId": "5qTYa3gzN9gYUb5SzqUhqg",
             "markupFileId": "o1bLJwFGxf9QGuTkyrOqig"
         },
         "state": "processing",
         "percentComplete": 0
      }
    
    

Step 6: Check Status of the MarkupBurner Resource

  • The process to generate a redacted document runs asynchronously on the PrizmDoc server. The POST request you sent in Step 5 will return immediately and before the output is ready. This means you will need to check the status of the process by sending a GET request to the resource you just created.
  • In response to this request, JSON will be returned that includes a state property. When this property is complete, the JSON response will also include an output property, which means you can proceed to the next step.
  • See the Markup Burner API for more details about this request.

    Example

     GET /PCCIS/V1/MarkupBurner/bQpcuixhvGmNqn5ElskO6Q
    
     200 OK
     Content-Type: application/json
     {
         "processId": "bQpcuixhvGmNqn5ElskO6Q",
         "expirationDateTime": "2014-12-03T18:30:49.460Z",
         "input": {
             "documentFileId": "5qTYa3gzN9gYUb5SzqUhqg",
             "markupFileId": "o1bLJwFGxf9QGuTkyrOqig"
         },
         "state": "complete",
         "percentComplete": 100,
         "output": {
             "documentFileId": "5ufb3ytUb1BxxgSUAk_G9Q"
         }
      }
    
    

Step 7: Download the Redacted Document

  • Once the markup burning process completes successfully, the new, redacted PDF document is available for download.

    Example

     GET http://192.168.0.1:18681/PCCIS/V1/WorkFile/5ufb3ytUb1BxxgSUAk_G9Q
    
     200 OK
     Content-Type: application/pdf
     [binary data]