Skip to main content

Creating a Searchable PDF os OS-X

So you have scanned some documents and need to OCR it, creating either a text file or a searchable PDF. Here's how I did it...

I scanned the document 300 dpi, but 150 is probably fine.

Tesseract is at the core of these solutions... It is available via macports or hmebrew.

If all you want is plain text, Tesseract is the way to go.

Otherwsie, for scannable PDF, there are a couple of options.

PDF OCR X is a graphical front end for Tesseract. The free version only processes one page at a time, but it does a very nice job. I did 40 pages in about 10 minutes. I then used Preview to combine all of the 40 individual pages into a single PDF document.

OCRKit is another non-free option, but you can get a 14 day demo. It does multiple pages at once, but I did not try it as it was more complicated than PDF OCR. However, it does look capable.

Other options are: Google Docs, Cuneiform OpenOCR.