Running Gaussian at HCC

Gaussian is a popular ab inito quantum chemistry program in the field of computational chemistry. Gaussian is a licensed program and currently University of Nebraska-Lincoln (UNL) owns a campus wide site license for its source codes [G09 Src Minor Rev. (D.01)] and a couple of pre-compiled binaries.

All faculty/staff/students of UNL are allowed to access g09 source code and run the g09 program at HCC. Collaborators of UNL faculty may be allowed to run g09 binaries while they are physically visiting UNL. Note that faculty/staff/students from other branches of University of Nebraska (NU) system including University of Nebraska Omaha, University of Nebraska Medical Center, University of Nebraska at Kearney are not allowed to run g09 program at HCC without purchase of a g09 license.

For access, contact us at   hcc-support@unl.edu and include your HCC username. After your account has been added to the group “gauss”, here are four simple steps to run Gaussian 09 on Tusker and Crane:

Step 1: Copy g09 sample input file and SLURM script to your “g09” test directory on the /work filesystem:

Copy sample files
cd $WORK
mkdir g09-test
cd g09-test
cp  /util/opt/gaussian/09/RevD/test_g98.com ./
cp  /util/opt/gaussian/09/RevD/run-g09-general.slurm ./

Step 2: Check g09 input file and modify it if necessary:

Review g09 input file
vi test_g98.com

Content of Gaussian input file test_g98.com:

Input file test_g98.com
%nprocs=4  
#P RHF/6-31G\*\* scf=direct test prop=fit

Gaussian Test Job 178:  
TATB rhf/6-31g**//hf/6-31g**
Energy with tight cutoffs would be -1006.2213391, is -1006.2213170
with  
default cutoffs

0,1  
X  
C,1,RC1  
C,1,RC2,2,60.  
C,1,RC1,3,60.,2,180.,0  
C,1,RC2,4,60.,3,180.,0  
C,1,RC1,5,60.,4,180.,0  
C,1,RC2,6,60.,5,180.,0

... ...

N,7,RCN2,13,90.,1,180.,0  
H,29,RNH,7,A2,13,0.,0  
H,29,RNH,7,A2,13,180.,0

RC1=1.431682  
RC2=1.451892  
RCN1=1.431748  
RNO=1.205098  
A1=120.501393  
RCN2=1.312086  
RNH=0.990828  
A2=118.920716

Step 3: Check g09 SLURM submission script file, and modify it if necessary:

Review SLURM submission script
vi run-g09-general.slurm

Content of Gaussian SLURM submission file run-g09-general.slurm:

run-g09-general.slurm
#!/bin/sh
#SBATCH -J g09
#SBATCH --nodes=1 --ntasks-per-node=4
#SBATCH --mem-per-cpu=2000
#SBATCH --time=01:00:00
#SBATCH --partition=batch
#SBATCH --error=TestJob.%J.stderr
#SBATCH --output=TestJob.%J.stdout

nodelist > ./machines.LINUX
echo $SLURM_JOB_ID > ./jobid

module load gaussian/09/RevD
source ${g09root}/g09/bsd/g09.profile
export GAUSS_SCRDIR=$TMPDIR

g09 test_g98.com

Step 4: Submit the job and wait for g09 job to be started by the scheduler:

Submit job
sbatch run-g09-general.slurm

Note that your account has to be a member of group “gauss” to run such example.

Some g09 restrictions you need to know:

  1. Parallel execution of our g09 is Open-MP based: This means you can run g09 only on a single node with multiple CPUs, e.g. #SBATCH --nodes=1 --ntasks-per-node=4 with SLURM.
    Also, make sure %nprocs=4 in g09 input file matches with your CPU request in your SLURM submission file.
  2. Scratch files directory for g09:
    export GAUSS_SCRDIR=$TMPDIR
    You may override the default scratch file location by explicitly specifying it in your SLURM submission file for Gaussian (ie. export GAUSS_SCRDIR=$PWD).
  3. Convert .chk file to .fchk file before loading to GaussianView:
    Type
        module load gaussian/09/RevD
        source ${g09root}/g09/bsd/g09.profile
        
    to load g09 environment.
    Type cd xxx to change directory, where xxx is the directory including the g09 generated .chk file.
    Type formchk yyy.chk yyy.fchk to convert format, where yyy is the file name of your .chk file before the suffix.


Run g09 with DMTCP (Checkpointing):

If your g09 job could not finish within 168 hours walltime, you may try the following steps to checkpoint your g09 job with dmtcp and resume the interrupted job afterwards.

Submit your initial g09 job with the following SLURM submission file:

Submit with dmtcp
#!/bin/sh
#SBATCH -J g09-dmtcp
#SBATCH --nodes=1 --ntasks-per-node=16
#SBATCH --mem-per-cpu=4000
#SBATCH --time=168:00:00
#SBATCH --partition=batch
#SBATCH --error=TestJob.%J.stderr
#SBATCH --output=TestJob.%J.stdout

nodelist >  ./machines.LINUX
echo $SLURM_JOB_ID > ./jobid

module load gaussian/09/RevD
source ${g09root}/g09/bsd/g09.profile
export GAUSS_SCRDIR=$PWD

rm -rf ckpt*
rm -rf dmtcp*

module load dmtcp

export DMTCP_CHECKPOINT_INTERVAL=561600
export DMTCP_HOST=localhost
export DMTCP_PORT=7779
export DMTCP_GZIP=1
export DMTCP_CHECKPOINT_DIR=$PWD
export DMTCP_SIGCKPT=12
export DMTCP_TMPDIR=/tmp

dmtcp_checkpoint g09 < au3O2-c13-pbepbegd3iop30-opt-tz.gjf > au3O2-c13-pbepbegd3iop30-opt-tz.log

One parameter you may need to adjust is DMTCP_CHECKPOINT_INTERVAL which controls the time interval in seconds for writing dmtcp checkpoint files. Currently it is set to 561600s, namely 156 hours. In other words, dmtcp will begin to write checkpointing files 12 hrs before the total 168 hrs walltime. The time to finish writing checkpointing files will vary with different types of g09 calculations. I would suggest you to try different DMTCP_CHECKPOINT_INTERVAL values in your submitted g09-dmtcp jobs to find a suitable value for your particular type of calculation.

Once your running job completes checkpointing, make sure you see a file called dmtcp_restart_script.sh generated in your job’s working directory. Then you can use the following SLURM submission file to resume your interrupted job:

Resume with dmtcp
#!/bin/sh
#SBATCH -J g09-restart
#SBATCH --nodes=1 --ntasks-per-node=16
#SBATCH --mem-per-cpu=4000
#SBATCH --time=168:00:00
#SBATCH --partition=batch
#SBATCH --error=TestJob.%J.stderr
#SBATCH --output=TestJob.%J.stdout

nodelist >  ./machines.LINUX
echo $SLURM_JOB_ID > ./jobid

module load gaussian/09/RevD
source ${g09root}/g09/bsd/g09.profile
export GAUSS_SCRDIR=$PWD

module load dmtcp

export DMTCP_CHECKPOINT_INTERVAL=561600
export DMTCP_HOST=localhost
export DMTCP_PORT=7779
export DMTCP_GZIP=1
export DMTCP_CHECKPOINT_DIR=$PWD
export DMTCP_SIGCKPT=12
export DMTCP_TMPDIR=/tmp

./dmtcp_restart_script.sh

The restarted job will not write the new output to the original output file au3O2-c13-pbepbegd3iop30-opt-tz.log, but to the file TestJob.%J.stdout instead.