This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new!

Not as bad as all that

Posted by: Anonymous Coward on January 04, 2006 11:02 PM
Higher-resolution scans make for better OCR. When we started transcribing <a href="" title="">Unix Text Processing</a> from PDF scans a couple of years back, several of us used GOCR to bring in text. It took no more time and less effort compared to typing it in by hand. Like with your experience, it requires some spell-checking and proofreading, but our results weren't nearly as bad as the listings showed.

Playing around with GOCR, I got results similar to yours with 300dpi scans. Things get much better with higher resolutions (use 2400dpi if you can get it).


Return to Optical character recognition is an uphill battle for open source