This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new Linux.com!

Linux.com

Language detection and desktop search

Posted by: Administrator on February 09, 2007 10:31 AM
Reliable language detection is already available.
Libtextcat 2.2 (<a href="http://software.wise-guys.nl/libtextcat/" title="wise-guys.nl">http://software.wise-guys.nl/libtextcat/</a wise-guys.nl>) implements the same technique discussed in the article, and supports 60+ languages. Version 3.0 will support more languages and have better encoding detection.
As for desktop search, Pinot (<a href="http://pinot.berlios.de/" title="berlios.de">http://pinot.berlios.de/</a berlios.de>) relies on libtextcat to identify the language of documents for stemming and filtering at search time. For instance, the search string "lang:es" will return documents in Spanish. Date ranges have been recently implemented so the next version will allow to search for "documents written in Spanish within the past week."

#

Return to KDE 4's Sonnet will turbocharge language processing