- About Us
Condor lets you queue multiple jobs, searches for free machines on the network (those with no keyboard activity, no load average, and no active Telnet users), and submits jobs to them, then returns the results to the machine from where it was submitted. Condor is a batch system -- once a job is submitted, there is no interaction between the job and the user. Any input to the job must be in a file which is submitted along with the executable to the Condor pool, while all the output during the execution of the job is written to a file, which is sent back as the result of execution to the submitting machine.
Setting up Condor
Before you set up a Condor pool, you need to know the four different roles a machine can play in a your pool:
Before you set up a Condor pool, you must decide which machine will play the central manager role, and which of the remaining clients are going to be the submit and execute machines (or both). For the simplest case, we'll set up a pool of two machines. One will be the central manager and also a submit and execute machine; the other will be only a submit/execute machine. You can use the same procedure to set up Condor on as many machines as you want.
Before you set up Condor on a machine, create a Condor user on that machine whose home directory will hold Condor-related files, such as logs.
#useradd -m -g condor condor
Now copy the downloaded Condor tar-archive into /home/condor and unpack it. Change into the unpacked directory, which I'll refer to as the release directory, and run the condor_configure script in order to install Condor on the machine:
#condor-configure --install --type=execute,submit,manager --local-dir=/home/condor --verbose
#condor-configure --install --type=execute,submit --local-dir=/home/condor --central-manager=hostname of central manager --verbose
If you ever want to change the configuration of Condor on a machine, you can run the script again.
Open the /etc/condor_config file in the release directory. Set the LOCAL_DIR variable to /home/condor, and set the HOSTALLOW_WRITE variable to an appropriate value (e.g. '*'). Make sure /dev/mouse is pointing to your mouse device, and /var/run/utmp is pointing to utmp on your machine. Next, edit the /home/condor/condor_local_config file and set CONDOR_IDS to '0.0'. This tells Condor to run its daemons as root.
Copy the files in the bin subdirectory of the release directory to a well-known location (such as /local/bin) so that Condor users can have access to them, and copy the files in the sbin subdirectory to a location that gives only the administrator access to them in his path.
Now you're ready to run condor_master on each machine to start the daemons:
On the central manager you should see the following daemons running if you run
$ps aux | egrep condor_:
On other machines, the following daemons should be running:
If you don't see these daemons running, there is a problem with your configuration. Look at /home/condor/logs/Masterlog to try to figure out what might be wrong.
You can run condor_status on any machine to list the machines that are currently in the pool, and their status (acclaimed, available, etc.).
Once Condor is running, it's time to put it to use. To submit a job to Condor, you need to write a description file for it. Writing a description file is easy, and the example below will show you how to write one.
#Example description file foo.cmd for job foo
Executable = foo
Universe = vanilla
input = test.data
output = foo.out
error = foo.error
Log = foo.log
The Executable variable points to the job which is to be run (it's a good idea to specify the absolute path to the executable), input is set to the file from which foo is supposed to take its input, output is set to the file to which foo is to write its output, error variable is set to the file to which any errors will be reported, and a log of whatever happened during the the job will be written to the file pointed to by Log variable.
Now you can submit the description file as a Condor job:
If you would like to run multiple instances of the same job with different input files for each instance, here is how to write the description files:
Executable = foo
Error = error.$(Process)
Input = input.$(Process)
Output = output.$(Process)
Log = foo.log
Note the entry
Queue 100. It tells Condor to run 100 instances of the job, with the input file for each being input.<job number>, and output and error files being similarly numbered.
To check the Condor queue and have a look at the status of the jobs being submitted, run:
To remove a job from the queue, use the job ID that
Condor is a powerful yet easy-to-use software system for managing a cluster of workstations. You can configure it in various ways, such as allowing it to run jobs only at night, or run jobs only on particular machines or machines with particular resources. The owner of any machine in the Condor pool can change the configuration of Condor to his likes so that jobs that are being executed on his machine are of a particular type or are executed at a particular time. Turn to the official documentation for ways to tune Condor for your needs.