- How do I login to Sandhills?
- What is module and how do I use it?
- Why can't my job write anything to my home directory?
- Where should I store my data?
- How do I compile an MPI program?
- What are the complexity requirements when changing my password?
- What is SLURM?
- How do I check my job status with SLURM?
- How do I start an interactive job with SLURM?
- How do I submit a serial job to SLURM?
- How do I run on my partition (or queue) with SLURM?
- How do I submit a parallel MPI job to SLURM?
- How do I create a script for opportunistic use of Sandhills nodes?
- When will my job start?
- How do I use job arrays with SLURM?
Using any ssh client, ssh to sandhills.unl.edu.Top
Module is available for use on several HCC machines. The module software simplifies the use of different compilers and versions by setting the environment for each with the use of a single command.
To see the list of available modules, run the command module avail.
To use a particular module, run module load modulename. For example, to use the 9.0-3 version of the PGI compiler suite, run module load compiler/pgi/12.8. To unload a module, run module unload modulename. To see the currently loaded module(s), run module list.
Switching modules may be done by either first unloading the old and then loading the new module, or running module switch oldmodule newmodule.
To see a complete list of module commands/options, run module help.
Please note that if you compile your application using a particular module, you must include the appropriate module load statement in your submit script.Top
The /home directories are read-only from the worker nodes. As they are not an area intended for active job I/O, this is done to prevent overwhelming the /home storage system while maintaining the ability for jobs to use binaries, config files etc. located there. Please use your corresponding /work directory for output from active jobs.
Please see the FAQ entry "Where should I store my data?" for further information about the difference between the /home and /work filesystems and the intended uses of each.
All HCC machines have two separate areas for every user to store data, each intended for a different purpose.
Your home directory (i.e. /home/[group]/[username]) is meant for items that take up relatively small amounts of space. For example: source code, program binaries, configuration files, etc. This space is quota-limited on a per-group basis. The home directories are backed up for the purposes of best-effort disaster recovery. This space is not intended as an area for I/O to active jobs.
Every user has a corresponding directory under /work using the same naming convention as /home (i.e. /work/[group]/[username]). We encourage all users to use this space for I/O to running jobs. This directory can also be used when larger amounts of space are temporarily needed. It is not quota-limited; however space in /work is shared among all users. It should be treated as short-term scratch space, and is not backed up. HCC reserves the right to delete data from this area when space becomes low; whenever the situation allows, users will be notified before this occurs and asked to voluntarily clear space.
If you have space requirements outside what is currently provided, please email email@example.com and we will gladly discuss alternatives.Top
First, login and use the module command to select the version of OpenMPI you wish to use. For example, module load openmpi-1.3.3/gcc-4.1.2. (See the above section for more information on using module.)
The MPI binaries and libraries will now be available in your environment.
Please note that you will need to include the appropriate module load statement in your submit script.
New passwords must be at least 9 characters long, and contain at least one number, one capital letter, and one symbol or punctuation mark.Top
Sandhills is using a resource manager called SLURM. It has similar functionality to TORQUE/Maui, but some of the commands are different:
|Job info||qstat [jobid]||scontrol show job [jobid]|
|Submit interactive job||qsub -I||srun|
|Delete job||qdel [jobid]||scancel [jobid]|
|Show estimated start time||showstart [jobid]||squeue --start|
Information regarding available queues/partitions
*SLURM also includes TORQUE wrapper scripts for qsub, qdel, qstat, qrls, qhold, and pbsnodes. These commands work very similarly to the TORQUE/PBS version.
Other common differences between TORQUE and SLURM are:
|TORQUE Characteristic||In SLURM||Comment|
|cd $PBS_O_WORKDIR||Not required||The batch job starts in directory from where the script was submitted|
|Use of ‘-v’ i.e. exporting environment||Not required||SLURM auto sets the worker node environment variables to environment variables that were defined in the job submit shell|
|Location of output files||Output and error files are created at their final destination||This is unlike TORQUE where these files are moved to final destination on job completion|
Also, when asking for entire node via SLURM, SLURM will allocate you the entire node; no additional jobs will be allocated to that node.
Helpful documentation from the SLURM developers:Top
Use the squeue command. Optionally, to see only your job(s) use the -u option with your username, e.g. squeue -u username
Job status codes are:
PD = (pending)
R = (running)
CA = (cancelled)
CF = (configuring)
CG = (completing)
CD = (completed)
F = (failed)
TO = (timeout)
NF = (node failure)
srun --pty $SHELL
Put any additional srun options before the $SHELL command. For example, to request 4 cores on a node:
srun --nodes=1 --ntasks-per-node=4 --pty $SHELL
See also: SLURM FAQTop
This script requests 1 node, 1 processors (tasks), and 1GB of RAM for 3 hours, 15 minutes. Errors and output will be written to job.[jobId].err and job.[jobId].out.
Submit your job script
Some research groups have dedicated hardware. This hardware is grouped in partitions (called queues in PBS/TORQUE).
To show your available partitions, use sinfo:
This output indicates you can run on two different partitions: example and batch. If no queue is specified, your job will run on the batch queue (the default is indicated by the asterisk).
To run in a given partition use the --partition (or -p) argument:
Or add the partition to your submit script as follows:
This script requests 16 cores on nodes with InfiniBand:
Submit your job script:
Some users may prefer to specify more details. This will allocate 32 tasks, 16 on each of two nodes:
The guest partition (or queue) is suitable for running short jobs (we suggest jobs with less than 48 hours of runtime) since there is no guarantee on how long your jobs will keep running before they are preempted.
To submit a guest queue job, simply change the line starting with "#SBATCH --partition=" in your original SLURM script to "#SBATCH --partition=guest" (or add it if missing).
On preemption, a job is returned to the queue and will restart once a resource is available again for opportunistic use.
The squeue command can show an estimated start time for a job in queue:
You can also get a time estimate for a given resource request with the --test-only argument to srun:
SLURM does not have native support for job arrays, but they can be simulated with a utility: arrayrun. The arrayrun utility submits a job multiple times and sets the environment variable $TASK_ID for each job in the array.
Example sbatch submit script:
Starting the jobs with arrayrun:
This submits ten jobs with $TASK_ID set to 1 in the first job, 2 in the second, etc. For more usage information, see arrayrun --help