Tesseract: Google re-releases HP's OCR tool

September 2006 ยท 1 minute read

You can check it out here: http://sourceforge.net/projects/tesseract-ocr

Tesseract is a tool HP released a while ago for recognizing text within images.

I just compiled and ran it. Seems cool but only works with tiff files and so far I’ve only gotten it working with their sample included file. Using Ubuntu and with the standard GCC compiler stuff on there it compiles no problem.

Just do ‘./configure’ then ‘make’. ‘Make install’ does not seem to be supported yet, and the executable gets created within the ccmain directory.

I tried taking some PNG’s with plain text on a white background and using the ‘convert’ tool to convert it to a tiff. A tiff is created but the OCR cannot read it. Oh well.

Anyone else using this? Getting interesting results?