An auto-redaction is a multi-step process that finds matches of a given regular expression, then permanently removes the text and blacks out the displayed region for each match. The end result is a new PDF document with no traces of the text that matched the regular expression.
The following steps will walk you through the auto-redaction process using the PrizmDoc Back-end RESTful Services.
Step 1: Upload your Source Document
- Upload the source document that you want to redact.
- This can be a document of any format supported by the PrizmDoc Back-end RESTful Services, except for DICOM documents which are not currently supported for redaction.
- In response to this request you will receive a file ID that is used to reference the source document in later requests.
Example |
Copy Code
|
POST http://192.168.0.1:18681/PCCIS/V1/WorkFile?FileExtension=pdf
Content-Type: application/octet-stream
[binary data]
200 OK
Content-Type: application/json
{
"fileId": "5qTYa3gzN9gYUb5SzqUhqg",
}
|
Step 2: Compose a Regular Expression
- Compose the regular expression that will match the text you want to redact in the document.
- The regular expression should adhere to the POSIX extended RE (ERE) or basic RE (BRE) syntax. (See details in this link: http://laurikari.net/tre/documentation/regex-syntax/)
- For example, the following regular expression will redact all US Social Security Numbers in a document:
Example |
Copy Code
|
"[0-9]{3}[-]?[0-9]{2}[-]?[0-9]{4}"
|
Note that the regular expression is sent to PrizmDoc in JSON format, so you should adjust the regular expression according to JSON syntax. Specifically, the backslash symbol should be duplicated.
If you create regular expressions programmatically, using string literals, you may need to further adjust the string according to the programming language syntax.
Example |
Copy Code
|
Regular expression (searches whole word "the", case insensitive):
"(?i:\bthe\b)"
JSON content:
"regex": "(?i:\\bthe\\b)";
C# code:
string regex = "(?i:\\\\bthe\\\\b)";
|
Step 3: Create Markup XML from the Regular Expression
Before the actual redaction process can be started, the regular expression needs to be converted to a format it can understand. PrizmDoc uses a proprietary XML syntax to define markups used for redaction, which you can generate by sending a POST request which requires two inputs:
- The file ID of source document you uploaded in Step 1, and
- The regular expression you created in Step 2.
Example |
Copy Code
|
POST http://192.168.0.1:18681/PCCIS/V1/RedactionCreator
Content-Type: application/json
{
"input": {
"documentFileId": "5qTYa3gzN9gYUb5SzqUhqg",
"autoRedactionRegularExpressions": ["[0-9]{3}[-]?[0-9]{2}[-]?[0-9]{4}"]
}
}
200 OK
Content-Type: application/json
{
"processId": "Rr64ma-U_HseoPrs6y0iiw",
"expirationDateTime": "2014-12-03T18:30:49.460Z",
"input": {
"documentFileId": "5qTYa3gzN9gYUb5SzqUhqg",
"autoRedactionRegularExpressions": ["[0-9]{3}[-]?[0-9]{2}[-]?[0-9]{4}"]
},
"state": "processing",
"percentComplete": 0
}
|
Step 4: Check Status of the RedactionCreator Resource
- The process to generate markup XML runs asynchronously on the PrizmDoc server. The POST request you sent in Step 3 will return immediately and before the output is ready. This means you will need to check the status of the process by sending a GET request to the resource you just created.
- In response to this request, JSON will be returned that includes a "state" property. When this property is "complete", the JSON response will also include an "output" property which means you can proceed to the next step.
- See the Redaction Creator API for more details of this request.
Example |
Copy Code
|
GET http://192.168.0.1:18681/PCCIS/V1/RedactionCreator/Rr64ma-U_HseoPrs6y0iiw
200 OK
Content-Type: application/json
{
"processId": "Rr64ma-U_HseoPrs6y0iiw",
"expirationDateTime": "2014-12-03T18:30:49.460Z",
"input": {
"documentFileId": "5qTYa3gzN9gYUb5SzqUhqg",
"autoRedactionRegularExpressions": ["[0-9]{3}[-]?[0-9]{2}[-]?[0-9]{4}"]
},
"state": "complete",
"percentComplete": 100,
"output": {
"markupFileId": "o1bLJwFGxf9QGuTkyrOqig"
}
}
|
Step 5: Start the Markup Burning Process (Redaction)
- Using the file IDs you obtained for the source document in Step 1 and the XML markup file in Step 4, you can now start the process to redact the document. This is accomplished by sending a POST request which will start a process that runs asynchronously on the PrizmDoc server to produce a redacted document.
Example |
Copy Code
|
POST http://192.168.0.1:18681/PCCIS/V1/MarkupBurner
Content-Type: application/json
{
"input": {
"documentFileId": "5qTYa3gzN9gYUb5SzqUhqg",
"markupFileId": " o1bLJwFGxf9QGuTkyrOqig"
}
}
200 OK
Content-Type: application/json
{
"processId": "bQpcuixhvGmNqn5ElskO6Q",
"expirationDateTime": "2014-12-03T18:30:49.460Z",
"input": {
"documentFileId": "5qTYa3gzN9gYUb5SzqUhqg",
"markupFileId": " o1bLJwFGxf9QGuTkyrOqig"
},
"state": "processing",
"percentComplete": 0
}
|
Step 6: Check Status of the MarkupBurner Resource
- The process to generate a redacted document runs asynchronously on the PrizmDoc server. The POST request you sent in Step 5 will return immediately and before the output is ready. This means you will need to check the status of the process by sending a GET request to the resource you just created.
- In response to this request, JSON will be returned that includes a "state" property. When this property is "complete", the JSON response will also include an "output" property which means you can proceed to the next step.
- See the Markup Burner API for more details of this request.
Example |
Copy Code
|
GET http://192.168.0.1:18681/PCCIS/V1/MarkupBurner/ bQpcuixhvGmNqn5ElskO6Q
200 OK
Content-Type: application/json
{
"processId": " bQpcuixhvGmNqn5ElskO6Q ",
"expirationDateTime": "2014-12-03T18:30:49.460Z",
"input": {
"documentFileId": "5qTYa3gzN9gYUb5SzqUhqg",
"markupFileId": " o1bLJwFGxf9QGuTkyrOqig"
},
"state": "complete",
"percentComplete": 100,
"output": {
"documentFileId": "5ufb3ytUb1BxxgSUAk_G9Q"
}
}
|
Step 7: Download the Redacted Document
- Once the markup burning process completes successfully, the new, redacted PDF document is available for download.
Example |
Copy Code
|
GET http://192.168.0.1:18681/PCCIS/V1/WorkFile/5ufb3ytUb1BxxgSUAk_G9Q
200 OK
Content-Type: application/pdf
[binary data]
|