Available Partitions
Partitions are used on Swan to distinguish different
resources. You can view the partitions with the command sinfo
.
Swan:¶
Priority for short jobs¶
To run short jobs for testing and development work, a job can specify a different quality of service (QoS). The short QoS increases a jobs priority so it will run as soon as possible.
SLURM Specification |
---|
#SBATCH --qos=short |
Limits per user for 'short' QoS
- 6 hour job run time
- 2 jobs of 16 CPUs or fewer
- No more than 256 CPUs in use for short jobs from all users
Limitations of Jobs¶
Overall limitations of maximum job wall time. CPUs, etc. are set for all jobs with the default setting (when thea "–qos=" section is omitted) and "short" jobs (described as above) on Swan. The limitations are shown in the following form.
SLURM Specification | Max Job Run Time | Max CPUs per User | Max Jobs per User | |
---|---|---|---|---|
Default | Leave blank | 7 days | 2000 | 1000 |
Short | #SBATCH --qos=short | 6 hours | 16 | 2 |
Please also note that the memory and local hard drive limits are subject to the physical limitations of the nodes, described in the resources capabilities section of the HCC Documentation and the partition sections above.
Owned/Priority Access Partitions¶
Partitions marked as owned by a group means only specific groups are allowed to submit jobs to that partition. Groups are manually added to the list allowed to submit jobs to the partition. If you are unable to submit jobs to a partition, and you feel that you should be, please contact hcc-support@unl.edu.
To submit jobs to an owned partition, use the SLURM --partition
option. Jobs
can either be submitted only to an owned partition, or to both the owned
partition and the general access queue. For example, assuming a partition
named mypartition
:
Submit only to an owned partition
#SBATCH --partition=mypartition
Submitting solely to an owned partition means jobs will start immediately until the resources on the partition are full, then queue until prior jobs finish and resources become available.
Submit to both an owned partition and general queue
#SBATCH --partition=mypartition,batch
Submitting to both an owned partition and batch
means jobs will run on both the owned
partition and the general batch queue. Jobs will start immediately until the resources
on the partition are full, then queue. Pending jobs will then start either on the owned partition
or in the general queue, wherever resources become available first
(taking into account FairShare). Unless there are specific reasons to limit jobs
to owned resources, this method is recommended to maximize job throughput.
Guest Partition(s)¶
The guest
partition can be used by users and groups that do not own
dedicated resources on Swan. Jobs running in the guest
partition
will run on the owned resources with Intel OPA interconnect. The jobs
are preempted when the resources are needed by the resource owners:
guest jobs will be killed and returned to the queue in a pending state
until they can be started on another node.
HCC recommends verifying job behavior will support the restart and
modifying job scripts if necessary.
To submit your job to the guest partition add the line
Submit to guest partition
#SBATCH --partition=guest
to your submit script.
Owned GPU resources may also be accessed in an opportunistic manner by
submitting to the guest_gpu
partition. Similar to guest
, jobs are
preempted when the GPU resources are needed by the owners. To submit
your job to the guest_gpu
partition, add the lines
Submit to guest_gpu partition
#SBATCH --partition=guest_gpu
#SBATCH --gres=gpu
to your SLURM script.
Preventing job restart¶
By default, jobs on the guest
partition will be restarted elsewhere when they
are preempted. To prevent preempted jobs from being restarted add the line
Prevent job restart on guest partition
#SBATCH --no-requeue
to your SLURM submit file.