Skip Navigation

Holland Computing Center

HCC

Credit Edit

  1. How do I login to Sandhills?
  2. What is module and how do I use it?
  3. Why can't my job write anything to my home directory?
  4. Where should I store my data?
  5. How do I compile an MPI program?
  6. What are the complexity requirements when changing my password?
  7. What is SLURM?
  8. How do I check my job status with SLURM?
  9. How do I start an interactive job with SLURM?
  10. How do I submit a serial job to SLURM?
  11. How do I run on my partition (or queue) with SLURM?
  12. How do I submit a parallel MPI job to SLURM?
  13. How do I create a script for opportunistic use of Sandhills nodes?
  14. When will my job start?
  15. How do I use job arrays with SLURM?

How do I login to Sandhills?

Using any ssh client, ssh to sandhills.unl.edu.

Top

What is module and how do I use it?

Module is available for use on several HCC machines. The module software simplifies the use of different compilers and versions by setting the environment for each with the use of a single command.

To see the list of available modules, run the command module avail.

To use a particular module, run module load modulename. For example, to use the 9.0-3 version of the PGI compiler suite, run module load compiler/pgi/12.8. To unload a module, run module unload modulename. To see the currently loaded module(s), run module list.

Switching modules may be done by either first unloading the old and then loading the new module, or running module switch oldmodule newmodule.

To see a complete list of module commands/options, run module help.

Please note that if you compile your application using a particular module, you must include the appropriate module load statement in your submit script.

Top

Why can't my job write anything to my home directory?

The /home directories are read-only from the worker nodes.  As they are not an area intended for active job I/O, this is done to prevent overwhelming the /home storage system while maintaining the ability for jobs to use binaries, config files etc. located there.  Please use your corresponding /work directory for output from active jobs.


Please see the FAQ entry "Where should I store my data?" for further information about the difference between the /home and /work filesystems and the intended uses of each.

Top

Where should I store my data?

All HCC machines have two separate areas for every user to store data, each intended for a different purpose. 

Your home directory (i.e. /home/[group]/[username]) is meant for items that take up relatively small amounts of space.  For example:  source code, program binaries, configuration files, etc.  This space is quota-limited on a per-group basis.  The home directories are backed up for the purposes of best-effort disaster recovery.  This space is not intended as an area for I/O to active jobs.

Every user has a corresponding directory under /work using the same naming convention as /home (i.e. /work/[group]/[username]).  We encourage all users to use this space for I/O to running jobs.  This directory can also be used when larger amounts of space are temporarily needed.  It is not quota-limited; however space in /work is shared among all users.  It should be treated as short-term scratch space, and is not backed up.  HCC reserves the right to delete data from this area when space becomes low; whenever the situation allows, users will be notified before this occurs and asked to voluntarily clear space. 

If you have space requirements outside what is currently provided, please email hcc-support@unl.edu and we will gladly discuss alternatives.

Top

How do I compile an MPI program?

First, login and use the module command to select the version of OpenMPI you wish to use. For example, module load openmpi-1.3.3/gcc-4.1.2. (See the above section for more information on using module.)

The MPI binaries and libraries will now be available in your environment.
Please note that you will need to include the appropriate module load statement in your submit script.

Top

What are the complexity requirements when changing my password?

New passwords must be at least 9 characters long, and contain at least one number, one capital letter, and one symbol or punctuation mark.

Top

What is SLURM?

Sandhills is using a resource manager called SLURM. It has similar functionality to TORQUE/Maui, but some of the commands are different:

  TORQUE/Maui SLURM*
Job info qstat [jobid] scontrol show job [jobid]
Submit job qsub sbatch
Submit interactive job qsub -I srun
Delete job qdel [jobid] scancel [jobid]
Show estimated start time showstart [jobid] squeue --start
Queue info
qstat squeue
Information regarding available queues/partitions
qstat -Qf sinfo/sshare

*SLURM also includes TORQUE wrapper scripts for qsub, qdel, qstat, qrls, qhold, and pbsnodes. These commands work very similarly to the TORQUE/PBS version.

Other common differences between TORQUE and SLURM are:

TORQUE  Characteristic In SLURM Comment
cd $PBS_O_WORKDIR Not required The batch job starts in directory from where the script was submitted
Use of ‘-v’ i.e. exporting environment Not required SLURM auto sets the worker node environment variables to environment variables that were defined in the job submit shell
Location of output files Output and error files are created at their final destination This is unlike TORQUE where these files are moved to final destination on job completion

Also, when asking for entire node via SLURM, SLURM will allocate you the entire node; no additional jobs will  be allocated to that node.

Helpful documentation from the SLURM developers:

Top

How do I check my job status with SLURM?

Use the squeue command.  Optionally, to see only your job(s) use the -u option with your username, e.g. squeue -u username .

Job status codes are:
PD = (pending)
R = (running)
CA = (cancelled)
CF = (configuring)
CG = (completing)
CD = (completed)
F = (failed)
TO = (timeout)
NF = (node failure)

Top

How do I start an interactive job with SLURM?

srun --pty $SHELL

Put any additional srun options before the $SHELL command. For example, to request 4 cores on a node:

srun --nodes=1 --ntasks-per-node=4 --pty $SHELL

See also: SLURM FAQ

Top

How do I submit a serial job to SLURM?

This script requests 1 node, 1 processors (tasks), and 1GB of RAM for 3 hours, 15 minutes. Errors and output will be written to job.[jobId].err and job.[jobId].out.
 

#!/bin/sh
#SBATCH --time=03:15:00          # Run time in hh:mm:ss
#SBATCH --mem-per-cpu=1024       # Minimum memory required per CPU (in megabytes)
#SBATCH --job-name=hello-world
#SBATCH --error=/work/[groupname]/[username]/job.%J.err
#SBATCH --output=/work/[groupname]/[username]/job.%J.out

module load example/test

hostname
sleep 60


Submit your job script

cd $WORK
sbatch submit-script.sh

Top

How do I run on my partition (or queue) with SLURM?

Some research groups have dedicated hardware. This hardware is grouped in partitions (called queues in PBS/TORQUE).

To show your available partitions, use sinfo:

$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
example      up 3-00:00:00      6   idle a[07-12]
batch*       up 3-00:00:00      2  down* a[01-02]
batch*       up 3-00:00:00     25   idle b[01-25]


This output indicates you can run on two different partitions: example and batch. If no queue is specified, your job will run on the batch queue (the default is indicated by the asterisk).

To run in a given partition use the --partition (or -p) argument:

srun --partition=example ./my-program


Or add the partition to your submit script as follows:

#SBATCH --partition=example

Top

How do I submit a parallel MPI job to SLURM?

This script requests 16 cores on nodes with InfiniBand:
 

#!/bin/sh
#SBATCH --ntasks=16              # 16 cores
#SBATCH --mem-per-cpu=1024       # Minimum memory required per CPU (in megabytes)
#SBATCH --time=03:15:00          # Run time in hh:mm:ss
#SBATCH --constraint=ib          # Require nodes with InfiniBand
#SBATCH --error=/work/[groupname]/[username]/job.%J.err
#SBATCH --output=/work/[groupname]/[username]/job.%J.out

module load openmpi/1.6.0

mpirun /home/[groupname]/[username]/mpiprogram

 

Submit your job script:

cd $WORK
sbatch submit-script.sh

 

Some users may prefer to specify more details. This will allocate 32 tasks, 16 on each of two nodes:
 

#!/bin/sh
#SBATCH --nodes=2                # 2 nodes
#SBATCH --ntasks-per-node=16     # Number of tasks to be invoked on each node
#SBATCH --mem-per-cpu=1024       # Minimum memory required per CPU (in megabytes)
#SBATCH --time=03:15:00          # Run time in hh:mm:ss
#SBATCH --constraint=ib          # Require nodes with InfiniBand
#SBATCH --error=/work/[groupname]/[username]/job.%J.err
#SBATCH --output=/work/[groupname]/[username]/job.%J.out

module load openmpi/1.6.0

mpirun /home/[groupname]/[username]/mpiprogram

Top

How do I create a script for opportunistic use of Sandhills nodes?

The guest partition (or queue) is suitable for running short jobs (we suggest jobs with less than 48 hours of runtime) since there is no guarantee on how long your jobs will keep running before they are preempted.

To submit a guest queue job, simply change the line starting with "#SBATCH --partition=" in your original SLURM script to "#SBATCH --partition=guest" (or add it if missing).

On preemption, a job is returned to the queue and will restart once a resource is available again for opportunistic use.

Top

When will my job start?

The squeue command can show an estimated start time for a job in queue:

$ squeue -o "%S" -j jobid
START_TIME
2012-12-21T16:11:11


You can also get a time estimate for a given resource request with the --test-only argument to srun:

$ srun --nodes=2 --test-only hostname
srun: Job 1337 to start at 2012-12-21T16:11:12 using 2 processors on a[13-14]

Top

How do I use job arrays with SLURM?

SLURM does not have native support for job arrays, but they can be simulated with a utility: arrayrun. The arrayrun utility submits a job multiple times and sets the environment variable $TASK_ID for each job in the array.

Example sbatch submit script:

#!/bin/sh

echo "I am task $TASK_ID on node `hostname`"
sleep 60


Starting the jobs with arrayrun:

$ arrayrun 1-10 example.sh


This submits ten jobs with $TASK_ID set to 1 in the first job, 2 in the second, etc. For more usage information, see arrayrun --help

Top