This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new Linux.com!

Linux.com

Feature: Distributions

NEBC Bio-Linux distro falls short

By W. Dean Freeman on September 10, 2008 (4:00:00 PM)

Share    Print    Comments   

As the fields of computational biology and bioinformatics become more important, not only to the economy, but to our understanding of the natural world and ourselves, Linux is becoming a better platform on which to build and deploy the software scientists will rely on. A few groups have even gone so far as to create entire distributions geared for computational biology, such as BioBrew and Debian-Med. One of the more prominent comes from Oxford's National Environmental Research Council's (NERC) Environmental Bioinformatics Centre (NEBC). Bio-Linux does not sell itself as your average distribution, but it does not measure up to an average distribution either.

NEBC Bio-Linux comes in two forms -- a Knoppix-based live DVD, and a net-install from NEBC's servers. To get the latter, you must fill out an application form. If you're worthy, NEBC will send you an installation package that comprises a CD, a diskette, and installation instructions. The CD provides the boot loader and customized scripts that partition the hard disk and copy the Bio-Linux snapshot image from NEBC's servers to the target computer. The diskette contains the information necessary for the installation scripts to find and access the server. Actual installation is expected to be conducted via the Internet, at a scheduled time negotiated with NEBC's help desk.

I downloaded and booted the live DVD image, and found an impressive amount of bioinformatics software available under a menu cleverly marked with a stylized DNA molecule. For instance, Jemboss is a Java-based front end to the EMBOSS (European Molecular Biology Open Software Suite) set of programs, which provide a comprehensive set of tools for sequence analysis and more. Taverna is a workflow manager that is compatible with various suites of bioinformatics tools. It allows scientists to piece together the processes which they want to pass their data through. Handlebar is a Web-based system for keeping track of bar codes for samples and inventory around the lab, written in Perl with a PostgreSQL back end. With more than 40 bioinformatics-related applications -- everything from rasmol to MrBayes -- and the ability to obtain more from NEBC's repositories, the scientific software selection does not disappoint.

However, for a system which touts itself as being geared toward "wet bench" scientists who may or may not have much Linux experience, all is not bread and roses, though most of the shortcomings in Bio-Linux are inherited from Knoppix rather than being anything the NEBC introduced itself.

Worse, though -- unlike most modern Linux live CD distributions, the Knoppix base provides no easily accessible hard drive installation option. Doing a little research, I was found the hidden invocation of sudo knoppix-installer from the shell. However, the Knoppix installation program is perhaps one of the most difficult I've encountered. Successfully partitioning the disk alone, for which knoppix-installer relies on QtParted, was a feat that makes the somewhat archaic methods of NetBSD seem like a walk in the park. When I finally figured that out, installed the system and rebooted, I was immediately greeted with a kernel panic and failure to boot. Multiple attempts all met the same fate, and I was unable to produce a working install from the live DVD.

In addition, the distro's development tools come up short. Bio-Linux Live provides the Eclipse integrated development environment, listed with the other bioinformatics software on the system. However, it does not include the EPIC Perl extensions for Eclipse. As Perl is one of the most common languages used for bioinformatics development, due to its native text parsing ability and the BioPerl modules, not providing EPIC, especially as one cannot permenantly add it while running a live DVD image, is definitely a problem in my book.

Perhaps the biggest problem with the verion 4 of Bio-Linux is that it is out of date, having been released in 2005 and running kernel 2.6.12. The userland applications, similarly, are older versions, from OpenOffice.org 2.0 beta to the scientific software (the included version of Taverna was 1.4, whereas 1.7 is current).

NEBC Bio-Linux 5 beta

I had hoped that the new version 5 beta, which was announced in July and which is based on Ubuntu 8.04, would fix the issues that I had with version 4, but it's no panacea. While Bio-Linux 5 beta's Ubuntu base is much easier to deal with than the three-year-old version of Knoppix, which leads to a much cleaner install process, the system has its own issues.

For instance, unlike Bio-Linux 4, version 5 has yet to integrate the bioinformatics tools into the application menu. That means if you want to use the tools, you must know their command names. At least the developers stuck them all in one directory -- /usr/local/bioinf/.

Version 5 currently fails to include Java, even though a lot of bioinformatics software, including Jemboss and Taverna, is Java-based. Of course, one can install Java from repositories, but that has its own host of problems, first of which is the fact that the bash script used to run Jemboss is hard-coded to look for Java at /usr/local/bin/java/, while repositories install it to /usr/bin/java/. This makes using the distribution as a live CD next to impossible if you need any of the Java-based suites.

Bio-Linux 4's live DVD image comes in at 1.9GB. Bio-Linux 5 Beta weighs in at a hefty 2.1GB. There is really no need for this -- what need is there for GNOME Games, for instance, on a laboratory computer?

It seems that while NEBC Bio-Linux is a laudable endeavor that does address certain needs for the community at which its targeted, it comes up short in a number of ways. While attempting to provide an easy-to-use, canned solution, it stumbles in many areas, from the application process for the "standard" installation to the various issues with the live DVD.

As most of the bioinformatics software included in Bio-Linux is available in the repositories for Ubuntu and other systems, the advantage of a dedicated distribution over installing that software by oneself on a "normal" Linux system is slim, at best. The two major applications that are not readily available in repositories, Jemboss and Taverna, can both be quickly installed by hand by anyone who has a basic familiarity with extracting a tarball and making a shell script executable. Both programs are written in Java and are started by bash scripts, and as an added bonus -- the stock Jemboss script isn't hard-coded with the location of the Java interpreter, unlike Bio-Linux's.

With new tools for creating live Linux images (especially for Fedora) and the ability to streamline what is included, there is really no reason a future version of NEBC Bio-Linux, or a similar project from another source, could not be made leaner and more focused. I hope that by the time that NEBC Bio-Linux 5 is actually released, the developers have at least remedied the situation with their Jemboss script and recreated the menu so that the bioinformatics tools that are included are more readily accessible.

Share    Print    Comments   

Comments

on NEBC Bio-Linux distro falls short

Note: Comments are owned by the poster. We are not responsible for their content.

maybe that should be: NEBC Bio-Linux distro "not exactly what I wanted"

Posted by: Anonymous [ip: 192.171.160.168] on September 19, 2008 11:39 AM
As someone who's been involved with the BL project for several years, and is typing this very comment on a Bio-Linux machine, I'm sorry that Mr. Freeman was disappointed by our offering. The application form is a big annoyance but arises from a pragmatic consideration - those familiar with Linux will be able to go through the regular Debian installation and then make use of our fully open package repository, which provides all the software. Most wet-bench scientists, on the other hand, want a no-brainer installation and assistance on-hand. What this means in practise is that we need to ask them about their requirements, set some configuration up in advance, and then make sure someone can be at the other end of the phone for them if problems arise.
Most people would charge good money for such a service, but all we've been asking is that people fill in a form. Apparently that was too much for this reviewer, who felt his time was better spent attempting to install our preview CD. Perhaps the final release of Bio-Linux 5 will please Mr. Freeman, or maybe he'll only be content when he has the moon on a stick.

TIM

#

NEBC Bio-Linux - a response from the developers

Posted by: Anonymous [ip: 192.171.174.31] on September 19, 2008 03:11 PM
The NERC Environmental Bioinformatics Centre was pleased with many of the positive comments made by Dean Freeman in his review. We would like to take this opportunity to clear up some issues raised in his review. The most important thing is that NEBC Bio-Linux has always been, and continues to be, free to anyone who wants it. Our application process is not used to deem worthiness; it is used only as information necessary for our firewall and the configuration files required for the network installation of Bio-Linux 4.0. Once our system is set up, people can install Bio-Linux at any time they choose; specifying a time allows us to ensure that staff are on hand in the NEBC office to provide support if required. We are currently developing Bio-Linux 5.0, which will be on an Ubuntu base. This will remove the need for a network installation, and will enable people to use the system as a Live-DVD or carry out a full installation. We have always recommended that the Live version of Bio-Linux 4.0 (on a Knoppix base) be used only as a live system as Knoppix was not designed with ease of installation in mind – as Dean experienced. Finally, we regret very much that Dean came upon a very early, and far from finished, version of the Bio-Linux 5.0 image. This image was advertised over a small mailing list maintained by us. We would like to re-assure our current users, and anyone else interested in the project, that the final version of Bio-Linux 5.0 will contain working Java and updated version of the bioinformatics software packages we maintain.

The NEBC Team

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya