Skip to main content

Creating a Searchable PDF os OS-X

So you have scanned some documents and need to OCR it, creating either a text file or a searchable PDF. Here's how I did it...

I scanned the document 300 dpi, but 150 is probably fine.

Tesseract is at the core of these solutions... It is available via macports or hmebrew. https://code.google.com/p/tesseract-ocr/wiki/ReadMe

If all you want is plain text, Tesseract is the way to go.

Otherwsie, for scannable PDF, there are a couple of options.

PDF OCR X is a graphical front end for Tesseract. The free version only processes one page at a time, but it does a very nice job. I did 40 pages in about 10 minutes. I then used Preview to combine all of the 40 individual pages into a single PDF document.

http://solutions.weblite.ca/pdfocrx/

OCRKit is another non-free option, but you can get a 14 day demo. It does multiple pages at once, but I did not try it as it was more complicated than PDF OCR. However, it does look capable.

Other options are: Google Docs, Cuneiform OpenOCR.