Redacting OCR – It’s a little more simple than it sounds

Hello Breezers!  I have received a few questions regarding redacting workflows.  “I have redacted images, but what do I do with the OCR?”  Here are my suggestions on how to automate OCR redaction.  The person in charge of the production should make sure the documents are reviewed and marked for redaction.  The next step is to make sure that the pages/documents which need redactions have been properly redacted and that the redactions are burned in or made permanent to the new images.  Last, the newly redacted images need to have a new OCR process ran on them to make new OCR files.  The main thing to remember is that the OCR associated with the original images contains the information that you are concealing on the image with the redaction.  If you follow these simple steps, then you can have a successful production.

Step 1 – Review

First, you will need to review your documents.  The review process can be done in a few different ways.  I have two favorite ways to approach this. You can load your imaged documents into a database or litigation support software.  Once the documents are in a database, then you can “mark” the pages which need to be redacted.  Another option is to image your documents and review the TIF/PDF files natively.  The second way is much more difficult to track without a database to support your reviewers.  I definitely recommend using a database product.

Step 2 – Redact

After you have decided which pages/documents need to be edited, you can use a variety of tools to redact the images.  In your litigation support database, you can use the markup tools to apply the redaction.  If you are reviewing the documents as TIF/PDF outside of a database, then you can use a Windows image editor or Adobe to redact the pages.  After the pages have been redacted, then you will need to save the images to your production folder.  When saving the redacted images for production, make sure that you have utilized any option to make the redaction permanent.  Typically, the redaction isn’t made permanent until this option has been selected.

Step 3 – Re-OCR

Now that your pages have been redacted you will need to perform a new OCR process on the edited image files.  This is a critical step in your production.  If the images are not OCR’d in their new redacted state, then the information that has been concealed on the images will still be viewable in the OCR files.  There are tools which will allow you to add OCR files to your new images or production image volumes.

Once this process has been completed, you should have a good production set.  Of course there are other things to consider when creating a production set.  Each of these steps take additional time to process and should be considered when establishing your workflow to make your production deadline.

  • What format do I need to produce my documents in?  Do I need to create load files?
  • Should I burn in a “Redacted” stamp into the new images?  Do I need to add new electronic bates stamps to my new images?
  • How many pages am I going to need to OCR again?

No comments (Add your own)

Add a New Comment

Enter the code you see below:
code
 

Comment Guidelines: No HTML is allowed. Off-topic or inappropriate comments will be edited or deleted. Thanks.


electronic document discovery