This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new Linux.com!

Linux.com

Seriously handy!!

Posted by: Anonymous [ip: 84.92.225.20] on August 30, 2007 04:02 PM
Along with imagemagick, grep, a pinch of bash, and a part list - I am using Tesseract to automatically index ~10,000 pages of pdf's containing vector drawings with vector drawn part number text, resulting in a mysqldump ready for import!

It might not be very efficient but it works and it won't cost a penny! FOSS pwnz.

#

Return to Google's Tesseract OCR engine is a quantum leap forward