Running PostgreSQL

This page describes how to run a PostgreSQL server instance with a SLURM job on HCC resources. Many software packages require the use of an SQL type database as part of their workflows. This example shows how to start a PostgreSQL server inside of a SLURM job on HCC resources. The database will be available as long as the SLURM job containing it is running, and other jobs may then be submitted to connect to and use it. The database files are stored on the clusters' filesystem (here $COMMON is used), so that even when the containing SLURM job ends the data is persistent. That is, you can submit a subsequent identical PostgreSQL server job and data that was previously imported in to the database will still be there.

Warning

Only one instance of the database server job can run at a time. Submitting multiple server jobs simultaneously will result in undefined behavior and database corruption.

Initial setup steps¶

A few initial setup commands must be run first. These commands only need be run once for each database instance you wish to run. The commands should be run from the login node.

First, choose a location to hold the database and configuration files. Here, we use a folder named postgres in the $COMMON directory. Change the value of POSTGRES_HOME if you wish to use another location. Run the following commands to create the needed directory structure and create a random password for the database that is stored in $POSTGRES_HOME/config/postgres-password.

$ export POSTGRES_HOME=$COMMON/postgres
$ mkdir -p $POSTGRES_HOME/{config,db/data,run}
$ uuidgen > $POSTGRES_HOME/config/postgres-password
$ chmod 600 $POSTGRES_HOME/config/postgres-password

Start the PostgreSQL SLURM job¶

Use the following submit script to start the job for the database:

postgres.submit

#!/bin/bash
#SBATCH --time=168:00:00
#SBATCH --mem=8gb
#SBATCH --job-name=postgres_server
#SBATCH --error=postgres_server.err
#SBATCH --output=postgres_server.out
#SBATCH --licenses=common
#SBATCH --dependency=singleton
#SBATCH --signal=B:SIGINT@60

export POSTGRES_HOME=$COMMON/postgres
export POSTGRES_PASSWORD_FILE=$POSTGRES_HOME/config/postgres-password
export POSTGRES_USER=$USER
export POSTGRES_DB=mydb
export PGDATA=$POSTGRES_HOME/db/data
export POSTGRES_HOST_AUTH_METHOD=md5
export POSTGRES_INITDB_ARGS="--data-checksums"
export POSTGRES_PORT=$(shuf -i 2000-65000 -n 1)

echo "Postgres server running on $(hostname) on port $POSTGRES_PORT"
echo "This job started at $(date +%Y-%m-%dT%T)"
echo "This job will end at $(squeue --noheader -j $SLURM_JOBID -o %e) (in $(squeue --noheader -j $SLURM_JOBID -o %L))"
module load apptainer
exec apptainer run -B $POSTGRES_HOME/db:/var/lib/postgresql -B $POSTGRES_HOME/run:/var/run/postgresql docker://postgres:11 -c "port=$POSTGRES_PORT"

This script starts a PostgreSQL server instance with the following properties:

The superuser username is set to your HCC username and the password is the random one generated earlier.
The server is started on a random port to avoid collisions.
The database name is mydb. This can be changed to whatever name you would like (some applications may require a specific name).
Checksums on data pages are enabled to help detect corruption.
Password authentication is required for security.

Additionally, the job is run with --dependency=singleton to ensure that only one instance (based on job name) is running at a time. Duplicate jobs submitted afterwards will queue until an earlier job exits. The --signal=B:SIGINT@60 option instructs SLURM to send a shutdown signal to the PostgreSQL server 60 seconds before the time limit of the job. This will help to avoid corruption by allowing the server to perform a graceful shutdown.

Once the job starts, check the postgres_server.out file for information on which host and port the server is listening on. For example,

Postgres server running on c1725.swan.hcc.unl.edu on port 10332
This job started at 2020-06-19T10:20:58
This job will end at 2020-06-19T10:50:57 (in 29:59)

Here, the server is running on host c1725.swan.hcc.unl.edu on port 10332. The output also contains information on when the job will end. This can be useful when submitting the companion analysis job(s) that will use the database. It is recommended to adjust the requested walltime of the analysis job(s) to ensure they will end before the database job does.

Accessing the PostgreSQL instance¶

The server instance can be accessed using the hostname and port from the job output, as well as your HCC username and the random password set initially. The exact method will depend on your application. Take care to treat the password in a secure manner.

Restarting the PostgreSQL instance¶

To restart the server, simply resubmit the same SLURM job as above. The first time the job is run, PostgreSQL will create the database from scratch. Subsequent runs will detect an existing database and will not overwrite it. Data entered into the database from previous runs will be available.

Note

Each new instance of the server will run on a different host and port. You will need to update these values before submitting subsequent analysis jobs.

Submitting jobs that require PostgreSQL¶

The simplest way to manage jobs that need the database is to manually submit them after the PostgreSQL SLURM job has started. However, this is not terribly convenient. A better way is to use the dependency feature of SLURM. Submit the PostgreSQL job first and make a note of the job id. In the submit script(s) of the analysis jobs, add the line

#SBATCH --dependency=after:<job id>

replacing <job id> with the numeric job id noted before. This will instruct SLURM to only begin running the analysis job(s) once the database job has begun.