Using Distributed Environment Modules on OSG

Many commonly used software packages and libraries are provided on the OSG through the module command.  OSG modules are made available through the OSG Application Software Installation Service (OASIS). The set of modules provided on OSG can differ from those on the HCC clusters.  To switch to the OSG modules environment on an HCC machine:

[apple@login.swan~]$ source osg_oasis_init

Use the module avail command to see what software and libraries are available:

[apple@login.swan~]$ module avail
------------------- /cvmfs/oasis.opensciencegrid.org/osg/modules/modulefiles/Core --------------------

   abyss/2.0.2                 gnome_libs/1.0                   pegasus/4.7.1
   ant/1.9.4                   gnuplot/4.6.5                    pegasus/4.7.3
   ANTS/1.9.4                  graphviz/2.38.0                  pegasus/4.7.4     (D)
   ANTS/2.1.0           (D)    grass/6.4.4                      phenix/1.10
   apr/1.5.1                   gromacs/4.6.5                    poppler/0.24.1    (D)
   aprutil/1.5.3               gromacs/5.0.0             (D)    poppler/0.32
   arc-lite/2015               gromacs/5.0.5.cuda               povray/3.7
   atlas/3.10.1                gromacs/5.0.5                    proj/4.9.1
   atlas/3.10.2         (D)    gromacs/5.1.2-cuda               proot/2014
   autodock/4.2.6              gsl/1.16                         protobuf/2.5
  Loading modules is done with the module load command: 

[apple@login.swan~]$ module load python/2.7

There are two things required in order to use modules in your HTCondor job.

  1. Create a wrapper script for the job.  This script will be the executable for your job and will load the module before running the main application.  
  2. Include the following requirements in the HTCondor submission script:

    Requirements = (HAS_MODULES =?= TRUE)

    or 

    Requirements = [Other requirements ] && (HAS_MODULES =?= TRUE)

A simple example using modules on OSG

The following example will demonstrate how to use modules on OSG with an R script that implements a Monte-Carlo estimation of Pi (mcpi.R).

First, create a file called mcpi.R:

mcpi.R
montecarloPi <- function(trials) {
  count = 0
  for(i in 1:trials) {
    if((runif(1,0,1)^2 + runif(1,0,1)^2)<1) {
      count = count + 1
    }
  }
  return((count*4)/trials)
}

montecarloPi(1000)

Next, create a wrapper script called R-wrapper.sh to load the required modules (R and libgfortran), and execute the R script:

R-wrapper.sh
#!/bin/bash

EXPECTED_ARGS=1

if [ $# -ne $EXPECTED_ARGS ]; then
  echo "Usage: R-wrapper.sh file.R"
  exit 1
else
  module load R
  module load libgfortran
  Rscript $1
fi

This script takes the name of the R script (mcpi.R) as it’s argument and executes it in batch mode (using the Rscript command) after loading the R and libgfortran modules.

Make the script executable:

[apple@login.swan~]$ chmod a+x R-script.sh

Finally, create the HTCondor submit script, R.submit:

R.submit
universe = vanilla
log = mcpi.log.$(Cluster).$(Process)
error = mcpi.err.$(Cluster).$(Process)
output = mcpi.out.$(Cluster).$(Process)
executable = R-wrapper.sh
transfer_input_files = mcpi.R
arguments = mcpi.R

Requirements = (HAS_MODULES =?= TRUE)
queue 100

This script will queue 100 identical jobs to estimate the value of Pi. Notice that the wrapper script is transferred automatically with the job because it is listed as the executable.  However, the R script (mcpi.R) must be listed after transfer_input_files in order to be transferred with the job.

Submit the jobs with the condor_submit command: 

[apple@login.swan~]$ condor_submit R.submit

Check on the status of your jobs with condor_q:

[apple@login.swan~]$ condor_q

When your jobs have completed, find the average estimate for Pi from all 100 jobs:

[apple@login.swan~]$ grep "[1]" mcpi.out.* | awk '{sum += $2} END { print "Average =", sum/NR}'
Average = 3.13821