This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new Linux.com!

Linux.com

Feature

Recoll: A search engine for the Linux desktop

By Dmitri Popov on April 23, 2007 (8:00:00 AM)

Share    Print    Comments   

Desktop search engines are all the rage these days. While Beagle may be the most popular desktop search engine for Linux, there are alternatives. If you are looking for a lightweight and easy-to-use yet powerful desktop search engine, you might want to try Recoll. Unlike Beagle, Recoll doesn't require Mono, it's fast, and it's highly configurable. Recoll is based on Xapian, a mature open source search engine library that supports advanced features such as phrase and proximity search, relevance feedback, document categorization, boolean queries, and wildcard search.

Recoll can handle plain text, HTML, OpenOffice.org documents, Mozilla Thunderbird and Evolution email messages, and Lyx and Scribus files. In addition to those native formats, Recoll can also work with other file types by using external helper applications. For example, the Xpdf software provides support for PDF files, while Word, PowerPoint and Excel documents are handled by Antiword and catdoc. If you want to enable support for document types that require external helpers, you have to install the helper apps separately using your distro's package manager (a list of the required external helpers is available at Recoll's Web site).

Recoll stores all internal data in Unicode UTF-8 format, but it can index files with different character sets, encodings, and languages into the same index.

Since Recoll's Web site provides binary packages for most major Linux distributions -- such as Fedora, SUSE, Ubuntu, and Debian -- you can install it easily using your distro's package manager. You can then launch Recoll by choosing Recoll from the Applications -> Accessories menu (in Ubuntu) or running the recoll command in a terminal window.

During the first run, you will be prompted to create a default set of configuration files that will contain all Recoll's settings. Recoll doesn't provide a GUI configuration tool, so you have to edit the configuration files manually. Fortunately, Recoll's user manual provides a detailed description of the configuration options that you can tweak. However, since Recoll's default settings cover all the basics, you might not need to edit them.

Like any desktop search engine, Recoll must index documents before it can search them. By default, Recoll indexes the files in your home directory, but you can specify another or additional locations. During the first run Recoll performs a full indexing, which can take some time. Once Recoll has built an index, you can update it manually using the recollindex command. You can also run recollindex as a cron job. Alternatively, you can run the recollindex -m command, which runs as a daemon that indexes modified files in real time.

Recoll
Recoll results - click to enlarge
Once the files have been indexed, Recoll is ready to go. To perform a simple search, enter a search term or terms into the search field and press the Search button. Besides the search for all or any specified term, Recoll also allows you to search for file names as well as perform more advanced searches using wildcards and boolean operators. Recoll supports three type of wildcards. The * wildcard can be used to match one or several characters (e.g. writ* returns writer, written, and writing). The ? wildcard matches just a single character (e.g. b?ll returns ball, bull, and bell). The [] wildcard allows you to specify a set of matching characters, e.g. [a-h] or [1-5]. To perform a boolean search, select the Query Language item from the drop-down menu next to the search field. You can then use boolean operators to construct more complex searches. For example, the following search from:"tristram shandy" linux AND openoffice -windows finds documents containing the word "tristram shandy" in the from field (useful when searching email messages) as well as the words "linux" and "openoffice" but not the word windows.

The Advanced Search feature can be used to create even more advanced queries. The default fields (called Clauses) allow you to specify a wide range of criteria, such as proximity, unlimited number of search terms (you can add extra fields by pressing the Add clause button), excluded words, and wildcards. You can also narrow your search to specific file types or a specific directory.

When you perform a search, Recoll displays the results in the main window. Each search result contains a file type icon, relevance in %, and context surrounding the search term. There are also two links: the Preview link allows you to quickly preview the document in a separate window, while the Edit link opens the file for editing in an appropriate application.

Finally, Recoll also features a Term Explorer tool (Tools -> Term Explorer) that can come in handy when you don't remember the exact spelling of a particular search term. Basically, it acts as a mini search engine that searches the index. This allows you to see all the derivatives of the entered search terms and select the one you need.

Although Recoll looks deceptively simple, it is indeed a powerful desktop search engine. To get the most out of it, make sure to read Recoll's user manual, paying particular attention to the tips and tricks section.

Dmitri Popov is a freelance writer whose articles have appeared in Russian, British, US, German, and Danish computer magazines.

Dmitri Popov is a freelance writer whose articles have appeared in Russian, British, US, German, and Danish computer magazines.

Share    Print    Comments   

Comments

on Recoll: A search engine for the Linux desktop

Note: Comments are owned by the poster. We are not responsible for their content.

addenda

Posted by: Anonymous Coward on April 24, 2007 02:45 AM
* Xapian powers e.g. search.gmane.org
* Recoll is included in ALT Linux: no need to download anything by hand, just apt-get install recoll

--
Michael Shigorin

#

Submitted

Posted by: Anonymous Coward on April 24, 2007 04:29 AM
submitted to tweako ( <a href="http://www.tweako.com/" title="tweako.com">http://www.tweako.com/</a tweako.com> )

#

Re:How's the KDE support?

Posted by: Anonymous Coward on April 25, 2007 07:42 PM
What's Gnome-centric about a screenshot of a QT app?<nobr> <wbr></nobr>;)

#

How's the KDE support?

Posted by: Administrator on April 24, 2007 09:53 AM
At least Beagle supports a large amount of KDE programs and integrates well with it via Kerry. The screenshots and supported apps you mention are all completely GNOME-centric; where's the KDE love?

If I have to stick with Bloaty Beagle just to get good KDE support along with good support for most things in general (does Strigi even support FLAC and Ogg?), then I will. In the meantime, Strigi does look interesting, but I won't be using it until it can at least support the file types I use. I don't know how to add new file support to Strigi, so that's out of the question as well.

#

fast and relaiable

Posted by: Administrator on April 24, 2007 09:59 AM
I am using Recoll since 9 months on Fedora 5 as well as Ubuntu 6.0.6. This software is fast! Wenn you have to search thousands of documents this tool is cool<nobr> <wbr></nobr>...
Good work from the programmer. He responds friendly when one needs help. Nice guy, nice project.

#

recoll without GUI

Posted by: Anonymous [ip: 192.168.201.73] on August 09, 2007 11:27 AM
Hi, can I compile Recoll without GUI interface? (i don't want to install QT libraries)

#

Recoll: A search engine for the Linux desktop

Posted by: Anonymous [ip: 140.180.22.101] on September 19, 2007 11:43 PM
This is HUGE!!! Extremely powerful tool.
I was so disappointed after trying beagle, strigi and Google Desktop (Linux). Finally my pain is over: Recoll does the job I want and gives me all the flexibility and power!

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya