Skip Navigation

Holland Computing Center

HCC

Credit Edit

  1. What is Condor?
  2. How do I create a Condor Script?
  3. How do I submit a job?
  4. Can I run BLAST on Condor?
  5. Can I run Java on Condor?
  6. My job needs user intervention. Is there an interactive mode for Condor? No.
  7. Are there queues in Condor because in SGE there are queues?
  8. Can I just forget about SGE?
  9. Now that I've submitted my job, how do I check the status?
  10. Is there a way to see all of the jobs I have run through Condor on PrairieFire?
  11. I don't see any output from my job. What's going on?
  12. How do I kill a job?
  13. My job was killed. What should I do if I don't know why?
  14. Why shouldn't I just ask for the maximum amount of resources possible at all times?
  15. Is there a user's manual available?
  16. What do I need to run through Condor and what can I run on the head node?

What is Condor?

The Condor project is designed to implement High Throughput Computing. Condor is a bundle of software that takes care of scheduling applications as well as checking for computing resources in a clustered/grid environment. The reason HCC is using Condor is its ability to cycle-scavenge. This means jobs may be run through Condor on processors not being used by SGE. Once SGE schedules jobs on those processors Condor either checkpoints its jobs or moves them out of the way. This way machines that are reserved can be used for computation. Right now, Condor can only handle serial jobs.

Top

How do I create a Condor Script?

Condor, much like SGE, needs a script to tell it how to do what the user needs. This is a basic script that should handle most jobs submitted to Condor.

#Example of a condor script
#with executable, stdin, stderr and log
Universe = vanilla
Executable = a.out
Arguments = file_name 12
Output = a.out.out
Error = a.out.err
Log = a.out.log
Queue

#
 
Lines starting with # are comments in Condor files.
 
Universe
is the way Condor manages different ways it can run, or what is called in the Condor documentation, a runtime environment. There is standard and vanilla on PrairieFire. The vanilla universe is where most jobs should be run. To run on the standard universe programs must be recompiled.
 
Executable
is the name of the executable you want to run on Condor.
 
Arguments
are the command line arguments for your program. For example, if one was to run ls -l / on Condor. The Executable would be ls and the Arguments would be -l /.
 
Output
is the file where the information printed to stdout is going to be sent.
 
Error
is the file where the information printed to stderr is going to be sent.
 
Log
is the file where information about your Condor job will be sent. Information like if the job is running, if it was halted or, if running in the standard universe, if the file was check-pointed or moved.
 
Queue
is the command to send the job to Condor's scheduler.


If one is to submit a job, like a Monte-Carlo simulation, where the same program needs to be run several times with the same parameters the script above can be used with one modification. The Queue command can be given the number of times one wants the job to be queued in Condor. So if the Queue command is changed to the one below, a.out will be run 5 times with the exact same parameters.

Queue 5

If one would like to submit the same job but with different parameters, Condor accepts files with multiple Queue statements. Only the parameters that need to be changed, need to be changed in the Condor file.

#Example of a condor script
#with executable, stdin, stderr and log
#and multiple Argument parameters
Universe = vanilla
Executable = a.out
Arguments = file_name 10
Output = a.out.$(Process).out
Error = a.out.$(Process).err
Log = a.out.$(Process).log
Queue
Arguments = file_name 20
Queue
Arguments = file_name 30
Queue

To submit a file to a windows machine, there needs to be a requirement that specifies the operating system and if files need to be transfered there are transfer file commands:

#Example of a condor script
#with executable, stdin, stderr and log
#and multiple Argument parameters
Universe = vanilla
Executable = a.out
Requirements = (OpSys == "WINNT51")
Arguments = file_name 10
Output = a.out.$(Process).out
Error = a.out.$(Process).err
Log = a.out.$(Process).log
TRANSFER_FILES = ALWAYS
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
transfer_input_files = file1, file2, file3
transfer_output_files = outputfile

Queue

Top

How do I submit a job?

condor_submit condor_script

Top

Can I run BLAST on Condor?

BLAST can be run on Condor in the vanilla universe. The NCBI toolkit is installed on /home/programs/ncbi.

To make life easier, decide with other people in your research group on a shared directory to hold the formatted databases. Why does this make things easier? Let's say someone makes a directory in their group directory called db_formatted and gives everyone else in their group access to it. Example:

>$ mkdir /home/swanson/db_formatted
>$ chmod g+rwx /home/swanson/db_formatted


Now create a file called .ncbirc in your home directory. It should contain this:

[NCBI]
Data = /home/programs/ncbi/data

[BLAST]
BLASTMAT = /home/programs/ncbi/data
BLASTDB = /home/swanson/db_formatted

This file contains paths to where the metricises needed by the NCBI tools are and the path to where the formatted data bases are. If the formatted databases are going to be placed somewhere else, change the value of BLASTDB.

Now look at the section on how to write a Condor script and section on how to submit. The BLAST condor script should look something like this. I'm using the nr database as an example.

#BLAST submission script
Universe = vanilla
Executable = /home/programs/ncbi/bin /blastall
Arguments = -p blastp -d nr -i test.seq -o blastout.ncbi
Log = BLAST.log
Queue


If you have any questions, contact the administrator

Top

Can I run Java on Condor?

Java can be run on Condor in the vanilla universe.

PrairieFire has JDK 1.4, 1.5, and 1.6 installed on /util/comp/sun/.... For all examples, I'll be using the path to JDK 1.6 because some code won't compile on 1.5.

Because there is supposed to be a Java universe in Condor, Java support under the vanilla universe is a little awkward. A wrapper has to be written around Java. The easiest way, at least for me, is to write it in BASH. Create a file called java.sh.

#!/bin/bash
# In case I need to point to some special libraries
#uncomment line bellow and add path
#export CLASSPATH=
/util/comp/sun/jdk1.6.0_10/bin/java $@


Make this file executable.

>$ chmod u+x java.sh

Look at the section on how to write a Condor script and section on how to submit. For the Executable entry type java.sh and for Arguments enter the program arguments, the name of the Java program and all it's parameters.

#Example of a condor script
#for Java
#with executable, stdin, stderr and log
#and multiple Argument parameters
Universe = vanilla
Executable = java.sh
Arguments = Hello_World 10
Output = hello.out
Error = hello.err
Log = hello.log
Queue

 

If you have any questions contact the administrators

Top

My job needs user intervention. Is there an interactive mode for Condor? No.

No.

Top

Are there queues in Condor because in SGE there are queues?

Even though Condor is a scheduler of sorts, there are no queues implemented in Condor. We would like Condor to be used more as a cycle scavenging system for serial processes. The Condor scheduler works per user, so everyone, no matter how many jobs they submit, will have access to time on the cluster.

Top

Can I just forget about SGE?

No. As stated in section one, Condor on PrairieFire does not have the capability to run parallel jobs. As well, SGE will preempt or kill Condor jobs. Thus, Condor is most appropriate for the following cases:

1. Serial code of short duration.
2. Serial code compiled under the standard universe.

Top

Now that I've submitted my job, how do I check the status?

condor_q will show you all processes in the Condor scheduler. condor_q username will show you all jobs in the scheduler from user username.
condor_status
will show you the status of the entire Condor pool.

Top

Is there a way to see all of the jobs I have run through Condor on PrairieFire?

No, but this may be added in the future.

Top

I don't see any output from my job. What's going on?

You probably did not add the Output line in your Condor script file. If you have any questions, contact the administrators.

Top

How do I kill a job?

Use condor_q to find out what job number your job is and use condor_rm to kill the job.

Top

My job was killed. What should I do if I don't know why?

Check the Condor queue, condor_q. If your job is in the queue, it wasn't killed, just probably restarted on another machine. As long as jobs are in the Condor queue, they are going to be run at some point. If your job is not in the Condor queue, and you didn't kill it, contact the administrators.

Top

Why shouldn't I just ask for the maximum amount of resources possible at all times?

On PrairieFire, since most of the nodes are homogeneous, and the nodes with more RAM are usually very busy, using ClassAdds to try to get these nodes will, most likely, get your job stuck on the queue forever. If you need more resources than the nodes Condor routinely uses contact the administrators for help.

Top

Is there a user's manual available?

There is a Condor user manual on-line at Condor's website.

Top

What do I need to run through Condor and what can I run on the head node?

As stated before, Condor on PrairieFire is a cycle-scavenging system. If you don't mind your job not running 'right now', the job's serial, or you don't want to use SGE, then use Condor. For Condor, if you're willing to experiment with the standard(section [sec:standard_universe]) universe, the code can be compiled on the head node.

Top