MPI Jobs on HCC
This quick start demonstrates how to implement a parallel (MPI) Fortran/C program on HCC supercomputers. The sample codes and submit scripts can be downloaded from mpi_dir.zip.
Login to a HCC Cluster¶
Connect to a HCC cluster and make a subdirectory
and make a subdirectory called mpi_dir under your $WORK directory.
$ cd $WORK
$ mkdir mpi_dir
In the subdirectory mpi_dir, save all the relevant codes. Here we
include two demo programs, demo_f_mpi.f90 and demo_c_mpi.c, that
compute the sum from 1 to 20 through parallel processes. A
straightforward parallelization scheme is used for demonstration
purpose. First, the master core (i.e. myid=0) distributes equal
computation workload to a certain number of cores (as specified by
--ntasksin the submit script). Then, each worker core computes a
partial summation as output. Finally, the master core collects the
outputs from all worker cores and perform an overall summation. For easy
comparison with the serial code (Fortran/C on HCC), the
added lines in the parallel code (MPI) are marked with "!=" or "//=".
Compiling the Code¶
The compiling of a MPI code requires first loading a compiler "engine"
such as gcc, intel, or pgi and then loading a MPI wrapper
openmpi. Here we will use the GNU Complier Collection, gcc, for
demonstration.
$ module load compiler/gcc/6.1 openmpi/2.1
$ mpif90 demo_f_mpi.f90 -o demo_f_mpi.x
$ mpicc demo_c_mpi.c -o demo_c_mpi.x
The above commends load the gcc complier with the openmpi wrapper.
The compiling commands mpif90 or mpicc are used to compile the codes
to.x files (executables).
Creating a Submit Script¶
Create a submit script to request 5 cores (with --ntasks). A parallel
execution command mpirun ./ needs to enter to last line before the
main program name.
Submit the Job¶
The job can be submitted through the command sbatch. The job status
can be monitored by entering squeue with the -u option.
$ sbatch submit_f.mpi
$ sbatch submit_c.mpi
$ squeue -u <username>
Replace <username> with your HCC username.
Sample Output¶
The sum from 1 to 20 is computed and printed to the .out files. The outputs from the 5 cores are collected and processed by the
master core (i.e. myid=0).