Crane and Tusker are managed by the SLURM resource manager. In order to run processing on Crane or Tusker, you must create a SLURM script that will run your processing. After submitting the job, SLURM will schedule your processing on an available worker node.
Before writing a submit file, you may need to compile your application.
All SLURM job output should be directed to your /work path.
$ cd /work/[groupname]/[username]
The environment variable
$WORK can also be used.
$ cd $WORK $ pwd /work/[groupname]/[username]
Review how /work differs from /home here.
The below example is for a serial job. For submitting MPI jobs, please look at the MPI Submission Guide.
A SLURM submit file is broken into 2 sections, the job description and
the processing. SLURM job description are prepended with
the submit file.
SLURM Submit File
#!/bin/sh #SBATCH --time=03:15:00 # Run time in hh:mm:ss #SBATCH --mem-per-cpu=1024 # Maximum memory required per CPU (in megabytes) #SBATCH --job-name=hello-world #SBATCH --error=/work/[groupname]/[username]/job.%J.err #SBATCH --output=/work/[groupname]/[username]/job.%J.out module load example/test hostname sleep 60
[username]should be replaced your group name and username. Your username can be retrieved with the command
id -unand your group with
Submitting the SLURM job is done by command
sbatch. SLURM will read
the submit file, and schedule the job according to the description in
the submit file.
Submitting the job described above is:
$ sbatch example.slurm Submitted batch job 24603
The job was successfully submitted.
Job status is found with the command
squeue. It will provide
information such as:
Checking the status of the job is easiest by filtering by your username,
-u option to squeue.
$ squeue -u <username> JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 24605 batch hello-wo <username> R 0:56 1 b01
Additionally, if you want to see the status of a specific partition, for
example if you are part of a partition,
you can use the
-p option to
$ squeue -p esquared JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 73435 esquared MyRandom tingting R 10:35:20 1 ri19n10 73436 esquared MyRandom tingting R 10:35:20 1 ri19n12 73735 esquared SW2_driv hroehr R 10:14:11 1 ri20n07 73736 esquared SW2_driv hroehr R 10:14:11 1 ri20n07
You may view the start time of your job with the
squeue --start. The output of the command will show the
expected start time of the jobs.
$ squeue --start --user lypeng JOBID PARTITION NAME USER ST START_TIME NODES NODELIST(REASON) 5822 batch Starace lypeng PD 2013-06-08T00:05:09 3 (Priority) 5823 batch Starace lypeng PD 2013-06-08T00:07:39 3 (Priority) 5824 batch Starace lypeng PD 2013-06-08T00:09:09 3 (Priority) 5825 batch Starace lypeng PD 2013-06-08T00:12:09 3 (Priority) 5826 batch Starace lypeng PD 2013-06-08T00:12:39 3 (Priority) 5827 batch Starace lypeng PD 2013-06-08T00:12:39 3 (Priority) 5828 batch Starace lypeng PD 2013-06-08T00:12:39 3 (Priority) 5829 batch Starace lypeng PD 2013-06-08T00:13:09 3 (Priority) 5830 batch Starace lypeng PD 2013-06-08T00:13:09 3 (Priority) 5831 batch Starace lypeng PD 2013-06-08T00:14:09 3 (Priority) 5832 batch Starace lypeng PD N/A 3 (Priority)
The output shows the expected start time of the jobs, as well as the reason that the jobs are currently idle (in this case, low priority of the user due to running numerous jobs already).
Removing the job is done with the
scancel command. The only argument
scancel command is the job id. For the job above, the command
$ scancel 24605