This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new!

Language detection and desktop search

Posted by: Administrator on February 09, 2007 10:31 AM
Reliable language detection is already available.
Libtextcat 2.2 (<a href="" title=""></a>) implements the same technique discussed in the article, and supports 60+ languages. Version 3.0 will support more languages and have better encoding detection.
As for desktop search, Pinot (<a href="" title=""></a>) relies on libtextcat to identify the language of documents for stemming and filtering at search time. For instance, the search string "lang:es" will return documents in Spanish. Date ranges have been recently implemented so the next version will allow to search for "documents written in Spanish within the past week."


Return to KDE 4's Sonnet will turbocharge language processing