This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new Linux.com!

Linux.com

Feature

Building a Linux supercomputer using SSH and PVM

By on April 10, 2006 (8:00:00 AM)

Share    Print    Comments   

If you have a couple of old Linux boxes sitting around, then you've got the makings of a supercomputer. Dust them off, install Secure Shell (SSH) and Parallel Virtual Machine (PVM), and start your complex algorithms.

All right, it's not quite as simple as that. PVM handles only the messaging between the machines. You must write your own programs to actually do anything.

First, network your PCs and set up NFS on each. I'm not going to go into detail because most Linux distributions take care of everything for you. With Debian, for example, simply connect a cable between your new PC and your network switch, stick in your installation CD, switch the PC on, and follow the prompts. If you need more information, take a look at the Linux.com how-tos on networking and NFS.

Now you can start setting up your PCs as a single supercomputer. In order for them to work as one, you need a single home directory -- hence, the need for NFS. Choose the machine that hosts the home directory and edit /etc/exports. If the file isn't there, then you must set up the PC as an NFS server -- check your distro's documentation. If you're using Debian, simply type sudo apt-get install nfs-kernel-server.

Now add in the details for each of the hosts where you want the common home directory. In this example, I'm exporting my home directory from polydamas (my NFS server) to three hosts: acamas, cassandra, and hector:

/home acamas(rw)
/home cassandra(rw)
/home hector(rw)

You can see the full list of possible options when exporting by typing man exports on the command line. Don't forget to add all hosts into your /etc/hosts file too.

Now either reboot your NFS server or check your distro's documentation for the relevant command that lets your hosts see the exports. On Debian, the command is exportfs –a.

You can now turn to your NFS client hosts and set them up so that they use the home directory that you're exporting from the NFS server. If you feel that exporting the whole /home is overkill, simply export the home directory for the user that you want to be able to run the supercomputer.

If you're confident that everything is going to work, then just move the current /home somewhere safe (don't forget to rename it /home_old). Run mkdir /home, then edit your /etc/fstab file so that it contains the details for the NFS server:

polydamas:/home /home nfs rw,sync 0 0

Make sure that your /etc/hosts file contains the IP address for your server, then either reboot or reload the NFS data:

sudo /etc/init.d/mountnfs.sh

If you're not quite that brave, mount the directories manually before you commit to automating the process fully.

Set up SSH

Now that you have a common /home, you need SSH. Chances are, your Linux distribution came bundled with SSH. Each of my machines uses Debian, which loads OpenSSH automatically.

Set up SSH so that you don't have to enter a password each time you use it. For more information, take a look at Joe Barr's "CLI Magic: OpenSSH" and Joe 'Zonker' Brockmeier's "CLI Magic: More on SSH."

You'll find yourself benefiting from a common /home directory. Instead of having to set up an authorized_keys2 file on each machine, you only have to do it once on the NFS server:

ssh-keygen -t dsa
cat .ssh/id_dsa.pub > .ssh/authorized_keys2

If you just want to be able to run processes in parallel, then you're ready to go.

Looking for more? You might want to create programs that use the resources of all of your machines. Let's say you have three Linux boxes connected to your network, and you have three Linux scripts sitting in your home directory that you need to process. Simply run each one via SSH:

#Run the files on the machines
ssh bainm@acamas ./batch_file1 &
ssh bainm@cassandra ./batch_file2 &
ssh bainm@hector ./batch_file3 &

You can distribute work around your network easily using this technique. Although useful, the scripts don't provide any feedback. You must check each machine manually for the progress of each file before you continue with your computations. However, you can add feedback by making each of the distributed files write its results back to a common file on your home directory.

In this next example, you can calculate pi to any number of decimal places:

#File name: calc_pi
RESULT_FILE=$1
DECIMAL_PLACES=$2
RESULT=$(echo "scale=$DECIMAL_PLACES;4*(4*a(1/5)-a(1/239))"|bc -l)
echo "$(uname -n) Pi: $RESULT" >> $RESULT_FILE

I calculated pi = 4 x ( 4arctan(1/5) - arctan(1/239) because that's what I was taught in college; there are other ways.

Now tell each of your machines to run a process:

ssh bainm@acamas . ./calc_pi pi_results 10 &
ssh bainm@cassandra . ./calc_pi pi_results 20 &
ssh bainm@hector . ./calc_pi pi_results 30 &

After a couple of seconds, a new file (pi_results) contains this code:

acamas Pi: 3.1415926532
cassandra Pi: 3.14159265358979323848
hector Pi: 3.141592653589793238462643383272

Let PVM do the work for you

While this is useful to know, you're probably better off using software that does all the work for you. If you're happy using C, C++, or Fortran, then PVM may be for you. Download it from the PVM Web site, or check if you can load it using your distro's methods. For instance, use this command on Debian:

sudo apt-get install pvm

Install PVM on all of the machines, then log on to the computer you want to use as your central host. If it's not your NFS server, remember to generate a key for it and add it to the .ssh/authorized_keys2 file. Once you start PVM by typing pvm on the command line, you can start adding hosts. Don't worry about starting PVM on the other machines -- that's done automatically when you add a host.

$ pvm
pvm> add acamas
add acamas
1 successful
                    HOST     DTID
                  acamas    80000
pvm>

If that seems a bit long-winded, then list your hosts in a file and get PVM to read it:

$ pvm hostfile

Type conf to check which hosts are loaded:

pvm> conf
conf
4 hosts, 1 data format
                    HOST     DTID     ARCH   SPEED       DSIG
               cassandra    40000    LINUX    1000 0x00408841
                  acamas    80000    LINUX    1000 0x00408841
                  hector    c0000    LINUX    1000 0x00408841
               polydamas   100000    LINUX    1000 0x00408841
pvm>

Type quit to exit PVM and leave it running in the background. Type halt to shut down PVM.

Now you can create a program that uses PVM. You need the PVM source code. As always, check the details for your distro -- usually, you can get the files easily. For example, Debian uses this command:

sudo apt-get install pvm-dev

You need the files on only one of your machines; thanks to the common home directory, you can use any of them. Create a directory called ~/pvm3/examples and look for a file called examples.tar.gz -- you'll probably find it in /usr/share/doc/pvm. Unpack this into the directory you just created. You'll see a set of self-explanatory files that show you exactly how to program with PVM. Start with master1.c and its associated file slave1.c. Examine the source code to see exactly how the process operates. Use this code to see it in action type:

aimk master1 slave1

aimk -- the program for compiling your PVM programs -- creates your executables and places them in ~/pvm3/bin/LINUX. Simply change to this directory and type master1. Assuming you're on the machine where you're running PVM, you should see something like this:

$ master1
Spawning 12 worker tasks ... SUCCESSFUL
I got 1300.000000 from 7; (expecting 1300.000000)
I got 1500.000000 from 8; (expecting 1500.000000)
I got 100.000000 from 1; (expecting 100.000000)
I got 700.000000 from 4; (expecting 700.000000)
I got 1100.000000 from 0; (expecting 1100.000000)
I got 1700.000000 from 9; (expecting 1700.000000)
I got 1900.000000 from 10; (expecting 1900.000000)
I got 2100.000000 from 11; (expecting 2100.000000)
I got 1100.000000 from 6; (expecting 1100.000000)
I got 900.000000 from 5; (expecting 900.000000)
I got 300.000000 from 2; (expecting 300.000000)
I got 500.000000 from 3; (expecting 500.000000)

If you're a Fortran programmer, don't worry -- there are some examples for you as well. Other languages don't offer examples, but look on the PVM Web site for support for numerous languages, including Perl, Python, and Java. You'll also find various applications to help with PVM, such as XPVM for a graphical interface.

Share    Print    Comments   

Comments

on Building a Linux supercomputer using SSH and PVM

Note: Comments are owned by the poster. We are not responsible for their content.

How does this compare to OpenMosix?

Posted by: Anonymous Coward on April 10, 2006 05:39 PM
OpenMosix is tied into the kernel, whereas this is userland based. Are there any speed differences? Can you start a daemon (such as Sendmail) had port the processes?

Rgds,

Jon "The Nice Guy"

#

Re:How does this compare to OpenMosix?

Posted by: Anonymous Coward on April 10, 2006 06:02 PM
As far as I understood the article, PVM is anything but transparent: Programs need to be rewritten to take advantage of the parallelization.

In contrast, OpenMosix does not require any changes. Adding OpenMosix to a LTSP terminal server using LTSP is 100% fix costs. Once your terminals are cluster-enabled, that's it: The toughest part is building your own kernel.

On the other hand, not only residing in userland but being platform independent is a f**king great thing if you've got just a few heavy-duty tasks to be run such as rendering movies.

#

Re:How does this compare to OpenMosix?

Posted by: Anonymous Coward on April 11, 2006 10:52 AM
Actually OpenMosix and PVM are very different beasts. OpenMosix is essentially a load balancing modification to the kernel (and soon user space) that will distribute processes among a "cluster" of machines, but does nothing to enable parallel operations. In a parallel programming environment such as PVM, the program enables multiple processes that have to communicate with each other. The advantage is that a given task is divided among many processes. OpenMosix migrates processes among machines, but they won't run any faster. Of course, you can run a lot of non-communicating processes and the system will handle it nicely.

#

pvm-enabled povray

Posted by: Anonymous Coward on April 11, 2006 02:03 AM
povray is a nice way to demonstrate the power of a pvm-cluster. You can pretty easily show how much faster a scene renders using multiple pvm nodes vs a single PC. If you install the povray packages on Debian, it's already pvm-ready.

#

PVM for a cluster?

Posted by: Anonymous Coward on April 11, 2006 12:38 PM
Who uses PVM anymore? Parallel computing has moved on to MPI. Checkout OpenMPI.org, lam-mpi.org, or <a href="http://www-unix.mcs.anl.gov/mpi/mpich/" title="anl.gov">http://www-unix.mcs.anl.gov/mpi/mpich/</a anl.gov> for good MPI implementation.

#

Re:PVM for a cluster?

Posted by: Anonymous Coward on April 11, 2006 10:23 PM
Actually, most parallel programming is now moving off of MPI and onto development environments based on Unified Parallel C.

#

Re:PVM for a cluster?

Posted by: Anonymous Coward on April 13, 2006 02:18 AM
UPC, or even better, Co-Array Fortran, is a nice programming model, which in many ways is better suited to HPC than MPI/OpenMP/pthreads.

But since actual production quality compilers for these languages are so far rare or non-existent, real-world development is still mostly done with MPI and/or OpenMP.

#

If openMosix simpler and updated

Posted by: Anonymous Coward on April 11, 2006 11:25 PM
It would be great. It should be simpler to set up an openMosix cluster, like apt-get install, change a few configuration options in<nobr> <wbr></nobr>/etc/openmosix, start the additional nodes and you are done. Moving, recreating, setting up, duplicating<nobr> <wbr></nobr>/home, insistance on using dhcp instead of static ips, and other set up procedures turns off those who would be considered casual users. But give those "casual users" the ability to simply add older boxes into a mini-cluster without too many steps or steps that endanger<nobr> <wbr></nobr>/home or require complicated setups and you will see a much larger number of users using openMosix.



I'd like to use openMosix. I finally saw a file or two appear in debian Stable. Each time I see some news on openMosix, or if I see something hit Debian stable, I take another look at the project, hoping. Always disappointed. Hasn't become ubiquitous for distros, hasn't turned into a project that is just "there" or is simply bundled with the distro itself.



Another problem with openMosix is the slowness in updating the project. Where's the 2.6 kernel solutions? Will I see openMosix supporting 2.6 kernel versions in Debian that have AMD flags (k-7, k-8, etc) or are compatible, or are we talking custom kernels, another barrier to "just running it". Where are the user tools for 2.6? Last time I checked (about a week ago) there were supposedly alpha 2.6 openMosix kernels in cvs, and either then or just a bit earlier, people posting about the lack of user tools with no responses. Which brings up another issue, time to update the site. If no one's paying attention to these issues, is this project still on track? Or are the developers making too much money installing and running openMosix on a few large customers' clusters and don't have time to bother with the FOSS project?



Developers and fans of openMosix, don't take this too critically or personally. Look at it as constructive criticism at worst. I'm just telling you from my non-guru, non-kernel-compiling-weekly seat, from a layman's perspective not a programmer's perspective. I've got some desktops that would benefit tremendously from openMosix. A cluster that works with migrated processes is exactly what I need, and others with older systems need, not PVM or Beowulf or other clusters. The project isn't aimed at us, but we could benefit just as the LTSP project benefits from its cluster setup.

#

Back Pain relief

Posted by: Anonymous Coward on May 28, 2006 01:53 PM
[URL=http://painrelief.fanspace.com/index.htm] Pain relief [/URL]

  [URL=http://lowerbackpain.0pi.com/backpain.htm] Back Pain [/URL]

  [URL=http://painreliefproduct.guildspace.com] Pain relief [/URL]
[URL=http://painreliefmedic.friendpages.com] Pain relief [/URL]
[URL=http://nervepainrelief.jeeran.com/painrelief<nobr>.<wbr></nobr> htm] Nerve pain relief [/URL]

#

PVM sux

Posted by: Anonymous Coward on April 28, 2006 01:46 AM
Alright. Cool you can network some Linux boxes. WOW. But try to use it to do anything useful, more than a silly sample program. The details of PVM will leave you begging for OpenMosix. And try visiting the "other language bindings" links you'll find in the article. The majority are dead links!! Unless you are a scientist or mathemetician I'd suggest you install ClusterKnoppix.

#

Re:PVM sux

Posted by: Anonymous Coward on May 10, 2006 01:41 AM
I think you guys are missing the point. PVM IS intended for mathematicians, scientists and render farms, but nothing else. OpenMosix is totally different and its target usesages are totally different. PVM is not remotley intended for desktop usage or migrating a bunch of tasks to other machines (ie OpenMosix), it is intended for usage on clusters which can be hundreds, thousands of machines in size. That is the difference.

#

Re:PVM

Posted by: Anonymous Coward on May 13, 2006 10:03 PM
Can somebody help me ?
I cant setup this!

My Project "thestylator.com"..
mail sylwek32 at<nobr> <wbr></nobr>:gmx: de)!

Help me..
got debian + ubuntu

#

Any help with PVM would be great!!

Posted by: Anonymous [ip: 128.138.134.178] on January 04, 2008 10:53 PM
Hello, I have a cluster on which I have been using PVM for almost two years now. Within the last two week something has happened, and it no longer works correctly. I have a setup nearly identical to the above example (with four nodes). When I look at the conf it looks identical (except I am using MacOSX). But, now the PVM is only sending jobs to the node whose DTID has the c0000. I can add the nodes in any order to make the c0000 on every possible node, and they all run fine, but only that one node gets any jobs to run? Any idea why this might be? I am really a novice and know very little about this, so any help at all would be appreciated.
Thanks!

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya