Language detection and desktop search

Posted by: Administrator on February 09, 2007 10:31 AM
Reliable language detection is already available.
Libtextcat 2.2 (<a href="" title=""></a>) implements the same technique discussed in the article, and supports 60+ languages. Version 3.0 will support more languages and have better encoding detection.
As for desktop search, Pinot (<a href="" title=""></a>) relies on libtextcat to identify the language of documents for stemming and filtering at search time. For instance, the search string "lang:es" will return documents in Spanish. Date ranges have been recently implemented so the next version will allow to search for "documents written in Spanish within the past week."


