HCC Acknowledgment Credit

Note

To submit an acknowledgement and receive the credit, please use the form here: https://hcc.unl.edu/acknowledgement-submission.

Note

The following text provides a detailed description of how the Acknowledgment Credit works.

As a quickstart, add the line

#SBATCH --qos=ac_<group>

to your submit script, replacing <group> with your group name. Run the hcc-ac program to check the remaining balance.

What is HCC Acknowledgment Credit?¶

Whenever a group acknowledges using HCC resources as part of their research work (dissertation, thesis, grant, publication, presentation, etc), HCC will allocate a time-limited quota of CPU and memory resources to the group in the shared portion of the machine that can be used with jobs to gain scheduling priority in Slurm. A Job submitted with an acknowledgment credit qos will be placed at the top of the list of pending jobs to start. Since this is for the shared portion of the machine, it may not start immediately like a Priority Access (owned partition) job will. This will, however, likely shorten the start time of the job considerably, depending on job requirements.

Different types of research activities are awarded different amounts of time.

The following table lists research activities and awarded time:

Research activity	1x CPU Time
dissertation	2 years
funded grant	2 years
journal publication	2 years
master thesis	2 years
poster presentation	1 year
undergrad thesis	1 year
unfunded grant	6 months

Time is awarded evenly for compute and memory resources at a ratio of 1CPU to 4GB of memory. The QoS remains active until either resource is exhausted.

Why this ratio?

All nodes in the Swan batch partition can meet this CPU to memory ratio.

Why have this ratio?

In short: fairness. All programs require CPU and memory to complete the tasks they were written to perform. Dividing the memory amount by the number of CPUs in the node allows a CPU to memory equivalency to be used. Time is awarded evenly to CPU and memory at this ratio. If a high core count job is running with memory requirements less that 4GB, the job would starve any pending high memory job from considering the otherwise idle memory resources on the nodes running the high core count job. Similarly, a high memory job may use all the memory on a node but only use a single CPU. Using the ratio will result in fair accounting of CPU and memory resources awarded.

Column description of the hcc-ac utility

hcc-ac header	Description
Slurm qos	The qos name that must be provided with the Slurm `--qos=ac_<name>` argument
CPUx1 time	Time remaining for a single CPU
MEMx4GB time	Time remaining for 4GB of memory
per-CPU AvgMEM	The per-CPU average memory size available for the CPU time remaining in the qos. If CPU time is consumed faster than memory time, this value will increase. If memory time is consumed faster than CPU time, this value will decrease.

Example of how to use the awarded time for the 'demo' group.¶

The awarded time is reduced down to 10 minutes to show consumption changes with differing job resource requirements:

All times are in days-hours:minutes:seconds as used in Slurm's '--time=' argument.

Default output

[demo01@login.hcc_cluster ~]$ hcc-ac
+-----------+--------------+--------------+----------------+
| Slurm qos | CPUx1 time   | MEMx4GB time | per-CPU AvgMEM |
+-----------+--------------+--------------+----------------+
| ac_demo   | 0-00:10:00   | 0-00:10:00   | 4.0GB          |
+-----------+--------------+--------------+----------------+

Use the Slurm quality of service argument '--qos' to gain access to the awarded time with increased priority:

--qos=ac_demo

[demo01@login.hcc_cluster ~]$ srun --qos=ac_demo --ntasks=1 --mem=8g --time=1:00 /bin/sleep 60

**job runs for 60 seconds**

After 60 second job

[demo01@login.hcc_cluster ~]$ hcc-ac
+-----------+--------------+--------------+----------------+
| Slurm qos | CPUx1 time   | MEMx4GB time | per-CPU AvgMEM |
+-----------+--------------+--------------+----------------+
| ac_demo   | 0-00:09:00   | 0-00:08:00   | 3.556GB        |
+-----------+--------------+--------------+----------------+

1 CPU minute and 2 4GB memory minutes were consumed by the prior srun job.

1 CPU minute ie, (--ntasks=1 --time=1:00)

2 4GB minutes ie, (--ntasks=1 --mem=8G --time=1:00 ~= 1 8GB minute ~= 2 4GB minutes)

The remaining per-CPU average memory can be found with the following equation:

(memory time remaining) / (cpu time remaining) * 4

ie, 8 / 9 * 4 == 3.556

Multiplying the remaining cpu time against the per-CPU average memory will give the same value as multiplying the remaining memory time against 4GB:

ie, 9 * 3.556 ~= 8 * 4

--ntasks=4

[demo01@login.hcc_cluster ~]$ srun --qos=ac_demo --ntasks=4 --mem-per-cpu=2G --time=1:00 /bin/sleep 60

**job runs for 60 seconds**

After 60 second job

[demo01@login.hcc_cluster ~]$ hcc-ac
+-----------+--------------+--------------+----------------+
| Slurm qos | CPUx1 time   | MEMx4GB time | per-CPU AvgMEM |
+-----------+--------------+--------------+----------------+
| ac_demo   | 0-00:05:00   | 0-00:06:00   | 4.8GB          |
+-----------+--------------+--------------+----------------+

4 CPU minutes and 2 4GB minutes were consumed by the prior srun job.

4 CPU minutes (--ntasks=4 --time=1:00)

2 4GB minutes (--ntasks=4 --mem-per-cpu=2G --time=1:00 ~= 1 8GB minute ~= 2 4GB minutes)

6 / 5 * 4 == 4.8

Insufficient Time

[demo01@login.hcc_cluster ~]$ srun --qos=ac_demo --ntasks=5 --mem-per-cpu=5000M --time=1:00 /bin/sleep 60
srun: error: Unable to allocate resources: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)

An example of a job requesting more resources than what remains available in the qos.

Corrected Memory Requirement

[demo01@login.hcc_cluster ~]$ srun --qos=ac_demo --ntasks=5 --mem-per-cpu=4800M --time=1:00 /bin/sleep 60

**job runs for 60 seconds**

Exhausted QoS

[demo01@login.hcc_cluster ~]$ hcc-ac
+-----------+--------------+--------------+----------------+
| Slurm qos | CPUx1 time   | MEMx4GB time | per-CPU AvgMEM |
+-----------+--------------+--------------+----------------+
| ac_demo   | exhausted    | exhausted    | 0.0GB          |
+-----------+--------------+--------------+----------------+

All remaining time was used. Any further submissions to the qos will be denied at submission time.

All of the above srun arguments work the same with sbatch within the submit file header.

Submit File Example

[demo01@login.hcc_cluster ~]$ cat submit_test.slurm
#!/bin/bash
#SBATCH --ntasks=4
#SBATCH --qos=ac_demo
#SBATCH --ntasks=5
#SBATCH --mem-per-cpu=4800M
#SBATCH --time=1:00

/bin/sleep 60
[demo01@login.hcc_cluster ~]$ sbatch ./submit_test.slurm

CPU and memory time in the qos are only consumed when jobs run against the qos. Therefore it is possible for more jobs to be submitted requesting the qos than could ever be run with it. Once the qos is exhausted all pending jobs attempting to use it will remain in pending state until more time is added to the qos or the job is modified to not consider the qos as a job requirement.

HCC will run a script periodically (hourly) to scan for jobs pending in this state and modify them to utilize the cluster's default qos.

Sizing what can fit in the qos's time limits using Slurm's --test-only argument.

You can attempt different combinations of --nodes, --ntasks, --ntasks-per-node, --mem-per-cpu and --mem utilizing the --test-only argument to size the job against what time remains in the qos.

For example, with the same 10 minute limit:

--test-only job to see if it fits within qos time limits

[demo01@login.hcc_cluster ~]$ hcc-ac
+-----------+--------------+--------------+----------------+
| Slurm qos | CPUx1 time   | MEMx4GB time | per-CPU AvgMEM |
+-----------+--------------+--------------+----------------+
| ac_demo   | 0-00:10:00   | 0-00:10:00   | 4.0GB          |
+-----------+--------------+--------------+----------------+

[demo01@login.hcc_cluster ~]$ srun --test-only --qos=ac_demo --ntasks=6 --time=2:00 --mem-per-cpu=4G
allocation failure: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)

[demo01@login.hcc_cluster ~]$ srun --test-only --qos=ac_demo --ntasks=5 --time=2:00 --mem-per-cpu=4G
srun: Job <number> to start at YYYY-MM-DDTHH:MM:SS using 5 processors on compute_node


[demo01@login.hcc_cluster ~]$ srun --test-only --qos=ac_demo --ntasks=5 --time=2:00 --mem=40G
allocation failure: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)

[demo01@login.hcc_cluster ~]$ srun --test-only --qos=ac_demo --ntasks=5 --time=2:00 --mem=20G
srun: Job <number> to start at YYYY-MM-DDTHH:MM:SS using 5 processors on compute_node


[demo01@login.hcc_cluster ~]$ srun --test-only --qos=ac_demo --ntasks=3 --time=3:00 --mem-per-cpu=12G
allocation failure: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)

[demo01@login.hcc_cluster ~]$ srun --test-only --qos=ac_demo --ntasks=2 --time=3:00 --mem-per-cpu=12G
allocation failure: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)

[demo01@login.hcc_cluster ~]$ srun --test-only --qos=ac_demo --ntasks=1 --time=3:00 --mem-per-cpu=12G
srun: Job <number> to start at YYYY-MM-DDTHH:MM:SS using 1 processors on compute_node