This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new!

Re:opensource OCR

Posted by: walt-sjc on January 04, 2006 07:56 AM
Most commercial products that claim 98-99% accuracy are lying. They MAY be that accurate when dealing with a typewritten original from an IBM Selectric or laser printed courier font document, but they SUCK at real-world documents. They generally don't handle skew well at all, don't handle boxes around text, totally lose formatting, etc.

I found it cheaper, faster, and more accurate to send the work off to India and have it triple hand-entered.

Correcting text can frequently take more time than retyping as well depending on the speed of the typist.


Return to Optical character recognition is an uphill battle for open source