MPI Jobs on HCC

This quick start demonstrates how to implement a parallel (MPI) Fortran/C program on HCC supercomputers. The sample codes and submit scripts can be downloaded from

Login to a HCC Cluster

Connect to a HCC cluster and make a subdirectory and make a subdirectory called mpi_dir under your $WORK directory.

$ cd $WORK
$ mkdir mpi_dir

In the subdirectory mpi_dir, save all the relevant codes. Here we include two demo programs, demo_f_mpi.f90 and demo_c_mpi.c, that compute the sum from 1 to 20 through parallel processes. A straightforward parallelization scheme is used for demonstration purpose. First, the master core (i.e. myid=0) distributes equal computation workload to a certain number of cores (as specified by --ntasks in the submit script). Then, each worker core computes a partial summation as output. Finally, the master core collects the outputs from all worker cores and perform an overall summation. For easy comparison with the serial code (Fortran/C on HCC), the added lines in the parallel code (MPI) are marked with “!=” or “//=”.



Compiling the Code

The compiling of a MPI code requires first loading a compiler “engine” such as gccintel, or pgi and then loading a MPI wrapper openmpi. Here we will use the GNU Complier Collection, gcc, for demonstration.

$ module load compiler/gcc/6.1 openmpi/2.1
$ mpif90 demo_f_mpi.f90 -o demo_f_mpi.x  
$ mpicc demo_c_mpi.c -o demo_c_mpi.x

The above commends load the gcc complier with the openmpi wrapper. The compiling commands mpif90 or mpicc are used to compile the codes to.x files (executables). 

Creating a Submit Script

Create a submit script to request 5 cores (with --ntasks). A parallel execution command mpirun ./ needs to enter to last line before the main program name.



Submit the Job

The job can be submitted through the command sbatch. The job status can be monitored by entering squeue with the -u option.

$ sbatch submit_f.mpi
$ sbatch submit_c.mpi
$ squeue -u <username>

Replace <username> with your HCC username.

Sample Output

The sum from 1 to 20 is computed and printed to the .out files. The outputs from the 5 cores are collected and processed by the master core (i.e. myid=0).