- About Us
Many people use SSH to log in to remote machines, copy files around, and perform general system administration. If you want to increase your productivity with SSH, you can try a tool that lets you run commands on more than one remote machine at the same time. Parallel ssh, Cluster SSH, and ClusterIt let you specify commands in a single terminal window and send them to a collection of remote machines where they can be executed.
The third application, ClusterIt, provides much more than a way to control many SSH sessions from a single console. While it includes parallel versions of cp (pcp), rm (prm), top (dtop), and df (pdf), it also provides directives for running a command on any node of the cluster (run), issuing a collection of commands over different nodes in the cluster (seq), and ensuring that only a single job is run on any node at any one time (jsh).
There are no ClusterIT packages for openSUSE, Ubuntu Hardy, or Fedora 9. I built ClusterIt 2.5 from source on a 64-bit Fedora 9 machine using
./configure && make && sudo make install.
ClusterIt uses the dvt command to open a terminal to all the nodes in a cluster. Unlike Cluster SSH, ClusterIt does not tile the new windows for you, so you might have to manually arrange the terminals unless your window manager gives you a pleasant layout by default. The controlling window in ClusterIt also has fewer options than the one in Cluster SSH. To start a terminal on every node in a cluster with ClusterIt, use the
-w option and specify the host names in a comma-separated list, as shown below. The screenshot shows the resulting terminals and control window. Closing the controlling window, the one with dtv in its title, does not close all the terminal windows -- you have to click on the quit button in the dvt window to close all the terminals in the session.
$ dvt -w p1,p2
For most of the ClusterIt commands, you specify the groups to connect to with
-g, the nodes with
-w, and can exclude nodes with
Use the pcp command to copy files to all the nodes in your cluster either serially (the default) or in parallel (using the
-c option). Copying in parallel can be quicker if you have the bandwidth to talk to all the nodes at once. If you copy in parallel, you can set the fanout using the
-f option. The default is 64, meaning that a copy to 64 nodes will run in parallel.
Some of the options to cp< are available in pcp. Unfortunately the
-a option commonly used with cp, which is an alias for
-cdpR, is not. The cp
-c option has a different meaning in pcp, and there is no
-d option in pcp. The
-R options work the same in pcp as in cp.
Some other commonly used options are missing from some of the other ClusterIt commands. For instance, the
-h option to df is missing in the ClusterIt parallel (pdf) implementation. Seeing the disk space in 1K blocks might be interesting for a computer, but you're likely more interested in knowing you have 140MB of free space on /var than the exact number of disk blocks.
In the commands below, I first recursively copy a directory tree to /tmp on both the hosts. If you want to copy to up to your fanout nodes all at once, include the
-c option in the pcp line. The ssh invocation after this verifies that one of the files in the example-tree exists on the p1 host. As you can see, the output of pdf is not as easily digestible as a
df -h would be. The prm command is then used to remove the tree on both hosts.
# pcp -pr -w p1,p2 example-tree /tmp/ df10.txt ... 100% 29 0.0KB/s 00:00 df1.txt ... 100% 29 0.0KB/s 00:00 ... # ssh p1 cat /tmp/example-tree/df10.txt Thu Oct 16 21:45:50 EST 2008 # pdf -w p1,p2 /tmp Node Filesystem 1K-Blks Used Avail Cap Mounted On p1 : /dev/mapper/VolGrou 15997880 3958792 11213336 27% / p2 : /dev/mapper/VolGrou 15997880 3536648 11635480 24% / # prm -rf -w p1,p2 /tmp/example-tree
The main strong point of ClusterIt is the support for job scheduling and barriers. Barriers allow you to write a script that runs on many nodes, and wait until all of the nodes are at the barrier before proceeding to execute the next part of the script. This sort of synchronization allows common "process then merge" scripting to be written without getting bogged down in the logistics of communicating with other nodes in your scripts. Just use the barrier command from your parallel scripts to wait for all the nodes to be at a given point in the script.
The jsh command can execute a command on any node of the cluster, with the restriction that only one command can be running on any node at any one point in time. This restriction prevents a node from becoming bogged down when the script using ClusterIt keeps issuing commands to be run on the cluster but the previous commands have not completed yet. For example, if you are compiling code using ClusterIt, perhaps there are a few source files that take much longer to compile than others. Without intelligent scheduling or the one-job-per-node restriction, the parallel compile might continue to give new jobs to a node that is still trying to compile a difficult file. Using jsh, nodes that need to take longer to work on a particular job can do so without having new jobs handed to them.
Using jsh is simple. You first execute the daemon jsd on the controlling host -- say, your desktop machine -- and then use jsh to issue commands. You can use the
-x options to tell the daemon what nodes are in the cluster, and then you can use jsh without any special arguments. An example of jsh usage is shown below. First the daemon is started and then a command is run with jsh. Running two calls to jsh and putting them into the background means that a job will be sent to both hosts, as the output shows at the end of the example.
# jsd -w p1,p2 jsd started with pid 24615 # jsh hostname p2: vfedora864prx2 # jsh "sleep 10; hostname" &  24648 # jsh "sleep 10; hostname" &  24650 p2: vfedora864prx2 p1: vfedora864prx1
Instead of using the
-x options, you can set the CLUSTER environment variable to the name of a file that contains a newline-separated list of hosts. The FANOUT variable controls how many nodes should be contacted in parallel for a job.
Each of these three applications has its strong points. If you are interested in issuing commands to a cluster and having some load balancing, ClusterIt should be your first choice. If you want a group of xterms and a single window to control them all, Cluster SSH provides the most user-friendly multiterminal of the three, allowing you to also disable a node temporarily or directly interact with just a single host before switching back to controlling them all. Parallel ssh includes a parallel rsync command and lets you control multiple hosts from a single terminal. You don't get to see your input mirrored to multiple xterms with Parallel ssh, but if you have a heterogeneous group of machines and frequently issue the same commands on them all, Parallel ssh will give you a single interactive terminal to them all without cluttering your display with individual xterms for each node.
Ben Martin has been working on filesystems for more than 10 years. He completed his Ph.D. and now offers consulting services focused on libferris, filesystems, and search solutions.