Partitions

Partitions are used on Crane and Rhino to distinguish different resources. You can view the partitions with the command sinfo.

Crane:

Full list for Crane

Rhino:

Full list for Rhino

Priority for short jobs

To run short jobs for testing and development work, a job can specify a different quality of service (QoS). The short QoS increases a jobs priority so it will run as soon as possible.

SLURM Specification
#SBATCH --qos=short
Limits per user for ‘short’ QoS
  • 6 hour job run time
  • 2 jobs of 16 CPUs or fewer
  • No more than 256 CPUs in use for short jobs from all users

Limitations of Jobs

Overall limitations of maximum job wall time. CPUs, etc. are set for all jobs with the default setting (when thea “–qos=” section is omitted) and “short” jobs (described as above) on Crane and Rhino. The limitations are shown in the following form.

SLURM Specification Max Job Run Time Max CPUs per User Max Jobs per User
Default Leave blank 7 days 2000 1000
Short #SBATCH –qos=short 6 hours 16 2

Please also note that the memory and local hard drive limits are subject to the physical limitations of the nodes, described in the resources capabilities section of the HCC Documentation and the partition sections above.

Owned Partitions

Partitions marked as owned by a group means only specific groups are allowed to submit jobs to that partition.  Groups are manually added to the list allowed to submit jobs to the partition.  If you are unable to submit jobs to a partition, and you feel that you should be, please contact hcc-support@unl.edu.

Guest Partition

The guest partition can be used by users and groups that do not own dedicated resources on Crane or Rhino.  Jobs running in the guest partition will run on the owned resources with Intel OPA interconnect.  The jobs are preempted when the resources are needed by the resource owners and are restarted on another node.

tmp_anvil Partition

We have put Anvil nodes which are not running Openstack in this partition. They have Intel Xeon E5-2650 v3 2.30GHz 2 CPU/20 cores and 256GB memory per node. However, they don’t have Infiniband or OPA interconnect. They are suitable for serial or single node parallel jobs. The nodes in this partition are subjected to be drained and move to our Openstack cloud when more cloud resources are needed without notice in advance.

Use of Infiniband or OPA

Crane nodes use either Infiniband or Intel Omni-Path interconnects in the batch partition. Most users don’t need to worry about which one to choose. Jobs will automatically be scheduled for either of them by the scheduler. However, if the user wants to use one of the interconnects exclusively, the SLURM constraint keyword is available. Here are the examples:

SLURM Specification: Omni-Path
#SBATCH --constraint=opa
SLURM Specification: Infiniband
#SBATCH --constraint=ib