A simple example of submitting an HTCondor job

This page describes a complete example of submitting an HTCondor job.

  1. SSH to Swan

    ssh command
    [apple@localhost]ssh apple@swan.unl.edu
    
    output
    [apple@login.swan~]$
    
  2. Write a simple python program in a file “hello.py” that we wish to run using HTCondor

    edit a python code named ‘hello.py’
    [apple@login.swan ~]$ vim hello.py
    

    Then in the edit window, please input the code below:

    hello.py
    #!/usr/bin/env python
    import sys
    import time
    i=1
    while i<=6:
            print i
            i+=1
            time.sleep(1)
    print 2**8
    print "hello world received argument = " +sys.argv[1]
    

    This program will print 1 through 6 on stdout, then print the number 256, and finally print  hello world received argument = <Command Line Argument Sent to the hello.py>

     

  3. Write an HTCondor submit script named “hello.submit”

    hello.submit
    Universe        = vanilla
    Executable      = hello.py
    Output          = OUTPUT/hello.out.$(Cluster).$(Process).txt
    Error           = OUTPUT/hello.error.$(Cluster).$(Process).txt
    Log             = OUTPUT/hello.log.$(Cluster).$(Process).txt
    notification = Never
    Arguments = $(Process)
    PeriodicRelease = ((JobStatus==5) && (CurentTime - EnteredCurrentStatus) > 30)
    OnExitRemove = (ExitStatus == 0)
    Queue 4
    
  4. Create an OUTPUT directory to receive all output files that generated by your job (OUTPUT folder is used in the submit script above )

    create output directory
    [apple@login.swan ~]$ mkdir OUTPUT
    
  5. Submit your job

    condor_submit
    [apple@login.swan ~]$ condor_submit hello.submit
    
    Output of submit
    Submitting job(s)
    
    ....
    4 job(s) submitted to cluster 1013054.
    
  6. Check status of condor_q

    condor_q
    [apple@login.swan ~]$ condor_q
    
    Output of condor_q
    -- Schedd: login.swan.hcc.unl.edu : <129.93.227.113:9619?...
     ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
    720587.0   logan          12/15 10:48  33+14:41:17 H  0   0.0  continuous.cron 20
    720588.0   logan          12/15 10:48 200+02:40:08 H  0   0.0  checkprogress.cron
    1012864.0   jthiltge        2/15 16:48   0+00:00:00 H  0   0.0  test.sh
    1013054.0   jennyshao       4/3  17:58   0+00:00:00 R  0   0.0  hello.py 0
    1013054.1   jennyshao       4/3  17:58   0+00:00:00 R  0   0.0  hello.py 1
    1013054.2   jennyshao       4/3  17:58   0+00:00:00 I  0   0.0  hello.py 2
    1013054.3   jennyshao       4/3  17:58   0+00:00:00 I  0   0.0  hello.py 3
    7 jobs; 0 completed, 0 removed, 0 idle, 4 running, 3 held, 0 suspended
    

    Listed below are the three status of the jobs as observed in the above output

    Symbol Representation
    H Held
    R Running
    I Idle and waiting
  7. Explanation of the $(Cluster) and $(Process) in HTCondor script

    $(Cluster) and $(Process) are variables that are available in the variable name space in the HTCondor  script. $(Cluster) means the prefix of your job ID and $(Process) varies from 0 through number of jobs called with Queue - 1. If your job is a single job, then $(Cluster) = <job ID> else, your job ID is combined with $(Cluster) and $(Process).

    In this example, $(Cluster)=“1013054” and $(Process) varies from “0” to “3” for the above HTCondor script. In majority of the cases one will use these variables for modifying the behavior of each individual task of the HTCondor submission, for example one may vary the input/output file/parameters for the run program. In this example we are simply passing the $(Process) as arguments as sys.argv[1] in hello.py. The lines of interest for this discussion from file the HTCondor script “hello.submit” are listed below in the code section :

    for $(Process)
    Output= hello.out.$(Cluster).$(Process).txt
    Arguments = $(Process)
    Queue 4
    

    The line of interest for this discussion from file “hello.py” is listed in the code section below:

    for $(Process)
    print "hello world received argument = " +sys.argv[1]
    
  8. Viewing the results of your job

    After your job is completed you may use Linux “cat” or “vim” command to view the job output.

    For example in the file hello.out.1013054.2.txt, “1013054” means $(Cluster), and “2” means $(Process) the output looks like.

    example of one output file “hello.out.1013054.2.txt”

    example of one output file hello.out.1013054.2.txt
    1
    2
    3
    4
    5
    6
    256
    hello world received argument = 2
    

  9. Please see the link below for one more example:

    http://research.cs.wisc.edu/htcondor/tutorials/intl-grid-school-3/submit_first.html

Next: Using Distributed Environment Modules on OSG