Available Partitions

Partitions are used on Swan to distinguish different resources. You can view the partitions with the command sinfo.

Swan:¶

Swan has a two shared public partitions available for use. The default partition batch and the GPU enabled partition, gpu. When you submit a job on Swan without specifying a partition, it will automatically use the batch

Partition Name	Notes
batch	Default Paritition Does not have GPUs
gpu	Shared partition with GPUs

On Swan jobs have a maximum runtime of 7 days, can request up to 2000 cores per user, and run up to 1000 jobs.

Worker Node Configuration¶

The standard configuration of a Swan worker node is:

Configuration	Value
Cores	56
Memory	250 GB
Scratch Storage	3.5 TB

Some Swan worker nodes are equipped with additional memory, with up to 2TB of memory available in some nodes.

GPU Enabled Worker Nodes¶

For GPU enabled worker nodes in the gpu partition, the following GPUs are available:

Description	SLURM Feature	Available Hardware
Tesla V100, with 10GbE	gpu_v100	1 node - 4 GPUs with 16 GB per node
Tesla V100, with OPA	gpu_v100	21 nodes - 2 GPUs with 32GB per node
Tesla V100S	gpu_v100	4 nodes - 2 GPUs with 32GB per node
Tesla T4	gpu_t4	12 nodes - 2 GPUs with 16GB per node
NVIDIA A30	gpu_a30	2 nodes - 4 GPUs with 24GB per node

Additional GPUs are available in the guest_gpu partition, but jobs running on this partition will be preemptable. Details on how the partition operates is available below in Guest Partition(s). The GPUs in this partition are listed in the partition list for Swan for the priority access partitions .

Resource requests and utilization

Please make sure your applications and software support the resources you are requesting. Many applications are only able to use a single worker node and may not scale well with large numbers of cores.

Please review our information on how many resources to request in our FAQ

For GPU monitoring and resource requests, please review our page on monitoring and optimizing GPU resources

A full list of partitions is available for Swan

SLURM Quality of Service¶

Swan has two available Quality of Service types available which help manage how the job gets scheduled. Overall limitations of maximum job wall time. CPUs, etc. are set for all jobs with the default setting (when thea "–qos=" section is omitted) and "short" jobs (described as above) on Swan. The limitations are shown in the following form.

	SLURM Specification	Max Job Run Time	Max CPUs per User	Max Jobs per User
Default	Leave blank	7 days	2000	1000
Short	#SBATCH --qos=short	6 hours	16	2

Please also note that the memory and local hard drive limits are subject to the physical limitations of the nodes, described in the resources capabilities section of the HCC Documentation and the partition sections above.

Priority for short jobs¶

To run short jobs for testing and development work, a job can specify a different quality of service (QoS). The short QoS increases a jobs priority so it will run as soon as possible.

SLURM Specification
`#SBATCH --qos=short`

Limits per user for 'short' QoS

6 hour job run time
2 jobs of 16 CPUs or fewer
No more than 256 CPUs in use for short jobs from all users

Owned/Priority Access Partitions¶

Partitions marked as owned by a group means only specific groups are allowed to submit jobs to that partition. Groups are manually added to the list allowed to submit jobs to the partition. If you are unable to submit jobs to a partition, and you feel that you should be, please contact hcc-support@unl.edu.

To submit jobs to an owned partition, use the SLURM --partition option. Jobs can either be submitted only to an owned partition, or to both the owned partition and the general access queue. For example, assuming a partition named mypartition:

Submit only to an owned partition

#SBATCH --partition=mypartition

Submitting solely to an owned partition means jobs will start immediately until the resources on the partition are full, then queue until prior jobs finish and resources become available.

Submit to both an owned partition and general queue

#SBATCH --partition=mypartition,batch

Submitting to both an owned partition and batch means jobs will run on both the owned partition and the general batch queue. Jobs will start immediately until the resources on the partition are full, then queue. Pending jobs will then start either on the owned partition or in the general queue, wherever resources become available first (taking into account FairShare). Unless there are specific reasons to limit jobs to owned resources, this method is recommended to maximize job throughput.

A full list of partitions is available for Swan

Guest Partition(s)¶

The guest partition can be used by users and groups that do not own dedicated resources on Swan. Jobs running in the guest partition will run on the owned resources with Intel OPA interconnect. The jobs are preempted when the resources are needed by the resource owners: guest jobs will be killed and returned to the queue in a pending state until they can be started on another node. HCC recommends verifying job behavior will support the restart and modifying job scripts if necessary.

To submit your job to the guest partition add the line

Submit to guest partition

#SBATCH --partition=guest

to your submit script.

Owned GPU resources may also be accessed in an opportunistic manner by submitting to the guest_gpu partition. Similar to guest, jobs are preempted when the GPU resources are needed by the owners. To submit your job to the guest_gpu partition, add the lines

Submit to guest_gpu partition

#SBATCH --partition=guest_gpu
#SBATCH --gres=gpu

to your SLURM script.

Preventing job restart¶

By default, jobs on the guest partition will be restarted elsewhere when they are preempted. To prevent preempted jobs from being restarted add the line

Prevent job restart on guest partition

#SBATCH --no-requeue

to your SLURM submit file.