Introduction
An auto-redaction is a multi-step process that finds matches of a given regular expression, then permanently removes the text and blacks out the displayed region for each match. The end result is a new PDF document with no traces of the text that matched the regular expression.
The following steps will walk you through the auto-redaction process using the PrizmDoc Server.
Step 1: Upload Your Source Document
- Upload the source document that you want to redact.
- This can be a document of any format supported by the PrizmDoc Server RESTful API, except for DICOM and CAD documents, which are not currently supported for redaction.
-
In response to this request, you will receive a file ID that is used to reference the source document in later requests.
Example
POST /PCCIS/V1/WorkFile?FileExtension=pdf Content-Type: application/octet-stream [binary data] 200 OK Content-Type: application/json { "fileId": "5qTYa3gzN9gYUb5SzqUhqg", }
Step 2: Compose a Regular Expression
- Compose the regular expression that will match the text you want to redact in the document.
- The regular expression should adhere to the POSIX extended RE (ERE) or basic RE (BRE) syntax. (See details in this link: http://laurikari.net/tre/documentation/regex-syntax/) NOTE: Undocumented regex features may work, however we don't provide support for them.
-
For example, the following regular expression will redact all US Social Security Numbers in a document:
Example
"[0-9]{3}[-]?[0-9]{2}[-]?[0-9]{4}"
NOTE: The regular expression is sent to PrizmDoc in JSON format, so you should adjust the regular expression according to JSON syntax. Specifically, the backslash symbol should be duplicated. If you create regular expressions programmatically, using string literals, you may need to further adjust the string according to the programming language syntax.
Example
Regular expression (searches whole word "the", case insensitive): "(?i:\bthe\b)" JSON content: "regex": "(?i:\\bthe\\b)"; C# code: string regex = "(?i:\\\\bthe\\\\b)";
Step 3: Create Markup JSON from the Regular Expression
Before the actual redaction process can be started, the regular expression needs to be converted to a format it can understand. PrizmDoc uses a proprietary XML syntax to define markups used for redaction, which you can generate by sending a POST request that requires two inputs:
- The file ID of source document you uploaded in Step 1.
-
One or more rules to match and redact the document text:
- Each rule includes a regular expression such as the one you created in Step 2, and
-
An object that describes how to redact the matching text.
Example
POST /v2/redactionCreators Content-Type: application/json { "input": { "source": { "fileId": "5qTYa3gzN9gYUb5SzqUhqg" }, "rules": [{ "find": { "type": "regex", "pattern": "[0-9]{3}[-]?[0-9]{2}[-]?[0-9]{4}" }, "redactWith": { "type": "RectangleRedaction", "reason": "Redacted" } }] } } 200 OK Content-Type: application/json { "processId": "Rr64ma-U_HseoPrs6y0iiw", "expirationDateTime": "2014-12-03T18:30:49.460Z", "input": { "source": { "fileId": "5qTYa3gzN9gYUb5SzqUhqg" }, "rules": [ { "find": { "type": "regex", "pattern": "[0-9]{3}[-]?[0-9]{2}[-]?[0-9]{4}" }, "redactWith": { "type": "RectangleRedaction", "reason": "Redacted" } } ] }, "state": "processing", "percentComplete": 0 }
Step 4: Check Status of the RedactionCreator Resource
- The process to generate markup XML runs asynchronously on the PrizmDoc server. The POST request you sent in Step 3 will return immediately and before the output is ready. This means you will need to check the status of the process by sending a GET request to the resource you just created.
- In response to this request, JSON will be returned that includes a
state
property. When this property iscomplete
, the JSON response will also include anoutput
property, which means you can proceed to the next step. -
See the Redaction Creator API for more details about this request.
Example
GET /v2/redactionCreators/Rr64ma-U_HseoPrs6y0iiw 200 OK Content-Type: application/json { "processId": "Rr64ma-U_HseoPrs6y0iiw", "expirationDateTime": "2014-12-03T18:30:49.460Z", "input": { "source": { "fileId": "5qTYa3gzN9gYUb5SzqUhqg" }, "rules": [ { "find": { "type": "regex", "pattern": "[0-9]{3}[-]?[0-9]{2}[-]?[0-9]{4}" }, "redactWith": { "type": "RectangleRedaction", "reason": "Redacted" } } ] }, "state": "complete", "percentComplete": 100, "output": { "markupFileId": "o1bLJwFGxf9QGuTkyrOqig" } }
Step 5: Start the Markup Burning Process (Redaction)
-
Using the file IDs you obtained for the source document in Step 1 and the XML markup file in Step 4, you can now start the process to redact the document by sending a POST request, which will start a process that runs asynchronously on the PrizmDoc server to produce a redacted document.
Example
POST /PCCIS/V1/MarkupBurner Content-Type: application/json { "input": { "documentFileId": "5qTYa3gzN9gYUb5SzqUhqg", "markupFileId": "o1bLJwFGxf9QGuTkyrOqig" } } 200 OK Content-Type: application/json { "processId": "bQpcuixhvGmNqn5ElskO6Q", "expirationDateTime": "2014-12-03T18:30:49.460Z", "input": { "documentFileId": "5qTYa3gzN9gYUb5SzqUhqg", "markupFileId": "o1bLJwFGxf9QGuTkyrOqig" }, "state": "processing", "percentComplete": 0 }
Step 6: Check Status of the MarkupBurner Resource
- The process to generate a redacted document runs asynchronously on the PrizmDoc server. The POST request you sent in Step 5 will return immediately and before the output is ready. This means you will need to check the status of the process by sending a GET request to the resource you just created.
- In response to this request, JSON will be returned that includes a
state
property. When this property iscomplete
, the JSON response will also include anoutput
property, which means you can proceed to the next step. -
See the Markup Burner API for more details about this request.
Example
GET /PCCIS/V1/MarkupBurner/bQpcuixhvGmNqn5ElskO6Q 200 OK Content-Type: application/json { "processId": "bQpcuixhvGmNqn5ElskO6Q", "expirationDateTime": "2014-12-03T18:30:49.460Z", "input": { "documentFileId": "5qTYa3gzN9gYUb5SzqUhqg", "markupFileId": "o1bLJwFGxf9QGuTkyrOqig" }, "state": "complete", "percentComplete": 100, "output": { "documentFileId": "5ufb3ytUb1BxxgSUAk_G9Q" } }