Kubernetes has a support for running batch jobs. A Job is a daemon which watches your pod and makes sure it exited with exit status 0. If it did not for any reason, it will be restarted up to backoffLimit
number of times.
Since jobs in Nautilus are not limited in runtime, you can only run jobs with meaningful command
field. Running in manual mode (sleep infinity
command
and manual start of computation) is prohibited.
Let’s run a simple job and get its result.
Create a job.yaml file and submit:
apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
resources:
limits:
memory: 200Mi
cpu: 1
requests:
memory: 50Mi
cpu: 50m
restartPolicy: Never
backoffLimit: 4
Explore what’s running:
kubectl get jobs
kubectl get pods
When the job is finished, your pod will stay in Completed state, and Job will have COMPLETIONS field 1 / 1. For long jobs, the pods can have Error, Evicted, and other states until they finish properly or backoffLimit is exhausted.
This example job did not use any storage and outputted the result to STDOUT, which can be seen as our pod logs:
kubectl logs pi-<hash>
The pod and job will remain for you to come and look at for ttlSecondsAfterFinished=604800
seconds (1 week) by default, and you can adjust this value in your job definition if desired.
Please make sure you did not leave any pods and jobs behind. To delete the job, run
kubectl delete job pi
You can group several commands, and use pipes, like this:
command:
- sh
- -c
- "cd /home/user/my_folder && apt-get install -y wget && wget pull some_file && do something else"
All stdout and stderr outputs from the script will be preserved and accessible by running
kubectl logs pod_name
Output from initContainer can be seen with
kubectl logs pod_name -c init-clone-repo
To see logs in real time do:
kubectl logs -f pod_name
The pod will remain in Completed state until you delete it or timeout is passed.
The backoffLimit field specifies how many times your pod will run in case the exit status of your script is not 0 or if pod was terminated for a different reason (for example a node was rebooted). It’s a good idea to have it more than 0.
There is no fair queue implemented on Nautilus. If you submit 1000 jobs, you block all other users from submitting in the cluster.
To limit your submission to a fair portion of the cluster, refer to this guide. Make sure to use a deployment and persistent storage for Redis pod. Here’s our example
Nautilus is primarily used for GPU jobs. While it’s possible to run large CPU-only jobs, you have to take certain measures to prevent taking over all cluster resources.
You can run the jobs with lower priority and allow other jobs to preempt yours. This way you should not worry about the size of your jobs and you can use the maximum number of resources in the cluster. To do that, add the opportunistic
priority class to your pods:
spec:
priorityClassName: opportunistic
Another thing to do is to avoid the GPU nodes. This way you can be sure you’re only using the CPU-only nodes and jobs are not preventing any GPU usage. To do this, add the node antiaffinity for GPU device to your pod:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: feature.node.kubernetes.io/pci-10de.present
operator: NotIn
values:
- "true"
You can use a combination of 2 methods or either one.