Posted by: Administrator
on February 09, 2007 10:31 AM
Reliable language detection is already available. Libtextcat 2.2 (<a href="http://software.wise-guys.nl/libtextcat/" title="wise-guys.nl">http://software.wise-guys.nl/libtextcat/</a wise-guys.nl>) implements the same technique discussed in the article, and supports 60+ languages. Version 3.0 will support more languages and have better encoding detection. As for desktop search, Pinot (<a href="http://pinot.berlios.de/" title="berlios.de">http://pinot.berlios.de/</a berlios.de>) relies on libtextcat to identify the language of documents for stemming and filtering at search time. For instance, the search string "lang:es" will return documents in Spanish. Date ranges have been recently implemented so the next version will allow to search for "documents written in Spanish within the past week."