Submitting an R job is very similar to submitting a serial job shown on Submitting Jobs.
There are two primary commands to use when submitting R scripts: Rscript
and R CMD BATCH
. Both commands will execute the passed script but
differ in the way they process output.
R CMD BATCH
When utilizing R CMD BATCH
all output will be directed to an .Rout
file named after your script unless otherwise specified. For
example:
#!/bin/bash
#SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob
module load R/3.5
R CMD BATCH Rcode.R
In the above example, output for the job will be found in the file
Rcode.Rout
. Notice that we did not specify output and error files in
our SLURM directives, these are not needed as all R output will go into
the .Rout
file. To direct output to a specific location, follow your
R CMD BATCH
command with the name of the file where you want output
directed to, as follows:
#!/bin/bash
#SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob
module load R/3.5
R CMD BATCH Rcode.R Rcodeoutput.txt
In this example, output from running the script Rcode.R
will be placed
in the file Rcodeoutput.txt
.
To pass arguments to the script, they need to be specified after R CMD
BATCH
but before the script to be executed, and preferably preceded
with --args
as follows:
#!/bin/bash
#SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob
module load R/3.5
R CMD BATCH "--args argument1 argument2 argument3" Rcode.R Rcodeoutput.txt
Rscript
Using Rscript
to execute R scripts differs from R CMD BATCH in that
all output and errors from the script are directed to STDOUT and STDERR
in a manner similar to other programs. This gives the user larger
control over where to direct the output. For example, to run our script
using Rscript
the submit script could look like the following:
#!/bin/bash
#SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob
#SBATCH --error=TestJob.%J.stderr
#SBATCH --output=TestJob.%J.stdout
module load R/3.5
Rscript Rcode.R
In the above example, STDOUT will be directed to the output file
TestJob.%J.stdout
and STDERR directed to TestJob.%J.stderr
. You
will notice that the example is very similar to to the
serial example.
The important line is the module load
command.
That tells the cluster to load the R framework into the environment so jobs may use it.
To pass arguments to the script when using Rscript
, the arguments
will follow the script name as in the example below:
#!/bin/bash
#SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob
#SBATCH --error=TestJob.%J.stderr
#SBATCH --output=TestJob.%J.stdout
module load R/3.5
Rscript Rcode.R argument1 argument2 argument3
Submitting a multicore R job to SLURM is very similar to Submitting an OpenMP Job, since both are running multicore jobs on a single node. Below is an example:
#!/bin/bash
#SBATCH --ntasks-per-node=16
#SBATCH --nodes=1
#SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob
#SBATCH --error=TestJob.%J.stdout
#SBATCH --output=TestJob.%J.stderr
module load R/3.5
R CMD BATCH Rcode.R
The above example will submit a single job which can use up to 16 cores.
Be sure to use limits in your R code so you only use 16 cores, or your performance will suffer. For example, when using the parallel package function mclapply:
library("parallel")
...
mclapply(rep(4, 5), rnorm, mc.cores=16)
Submitting a multinode MPI R job to SLURM is very similar to Submitting an MPI Job, since both are running multicore jobs on a multiple nodes. Below is an example of running Rmpi on Swan on 2 nodes and 32 cores:
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=16
#SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob
#SBATCH --error=TestJob.%J.stdout
#SBATCH --output=TestJob.%J.stderr
module load compiler/gcc/4.9 openmpi/1.10 R/3.5
export OMPI_MCA_mtl=^psm
mpirun -n 1 R CMD BATCH Rmpi.R
When you run Rmpi job on Swan, please use the line export
OMPI_MCA_mtl=^psm
in your submit script. Regardless of how may cores your job uses, the Rmpi package should
always be run with mpirun -n 1
because it spawns additional
processes dynamically.
Please find below an example of Rmpi R script provided by The University of Chicago Research Computing Center:
library(Rmpi)
# initialize an Rmpi environment
ns <- mpi.universe.size()
mpi.spawn.Rslaves(nslaves=ns)
# send these commands to the slaves
mpi.bcast.cmd( id <- mpi.comm.rank() )
mpi.bcast.cmd( ns <- mpi.comm.size() )
mpi.bcast.cmd( host <- mpi.get.processor.name() )
# all slaves execute this command
mpi.remote.exec(paste("I am", id, "of", ns, "running on", host))
# close down the Rmpi environment
mpi.close.Rslaves(dellog = FALSE)
mpi.exit()