Applying OCR to Existing PDFs on your Computer


Sometimes, PDF documents will not be scanned directly by you, but downloaded from electronic-based Journal archives or other online sources and reside on your computer.  These PDFs may have not been scanned appropriately by applying OCR so we will need to apply OCR to the existing files.  If you forget to apply OCR to your own scanned files, you can follow these directions as well.  To apply OCR, do the following:

  1. Open your PDF document.
  2. Go to Document menu > OCR Text Recognition > Recognize Text Using OCR

    Accessing the Recognize Text Using OCR option from the Document Menu if you did not use the OCR option when scanning document initially.
  3. A Recognize Text window appears.  Under Pages, Current page should be selected.

    Recognize Text dialog box.
  4. The Settings box tells you how Acrobat will apply the OCR.  If you want to change these settings, click on the Edit button.  The Recognize Text - Settings window appears.

    Recognize Text - Settings dialog box.
  5. Select English for primary OCR Language.
  6. Select Searchable Image for PDF Output Style
  7. Leave Downsample Images at 600 dpi for now or adjust as necessary.  Click the OK button.
  8. Back on the Recognize Text window, click the OK button to begin the OCR process and recognize text.

    Adobe Acrobat indicates when OCR process is complete.  "OCR progress bar."
  9. Acrobat will begin the OCR Process.  You can see the progress of this process by looking at the progress bar on the lower right of the screen.
  10. Although Acrobat has recognized text in the document, it may not have tags.  You will need to apply tags by going to the Advanced menu > Accessibility > Add Tags to Document (screenshot below).  Acrobat will do it's best to add "Tags" to the different components of your document.

    Accessing the Add Tags to Document from the Advanced menu.

Applying OCR to Multiple Files at a Time

You can use Acrobat's OCR feature to recognize text in multiple files at once by:

  1. Know the location of the PDF image files that need to have text recognized.
  2. Open Adobe Acrobat Professional.
  3. Go to Document menu > OCR Text Recognition > Recognize Text in Multiple Files Using OCR.
  4. In the Paper Capture Multiple Files dialog box, click Add Files and choose Add Files, Add Folders, or Add Open Files.  Then select the files or folder.
  5. In the Output Options dialog box, specify a target folder for output files, set filename preferences, and select an output format.

    Output Options dialog box.
  6. In the Recognize Text - Settings dialog box, specify the options and then click OK.

    Recognize Text - Settings dialog box.
  7. Acrobat will recognize text and render the file after a few seconds.  Review and repair the document for accessibility.  See Stages 1-3 for more information on how to do this.

Credits

  • This training has been funded in part by the EnACT (Ensuring Access through Collaboration and Technology) grant. 
  • Written and developed by Sacramento State Online Learning professionals, Monica Range and Cryssel Vera and technical crew, Ivan Vajar, Ken Young, and Jennifer Wicks with guidance from the CSU Professional Development Work Group.

Support

If you have difficulty accessing any material on this site or need an alternate format, or you just have questions and want to give feedback, contact the Accessible Technology Initiative.

Copyright, 2009, All Right Reserved