CMSC498X/CMSC818X - Introduction to Parallel Computing (Fall 2020)

Introduction to Parallel Computing (CMSC498X/CMSC818X)

Assignment 2: Performance Tools

Due: Monday October 19, 2020 @ 11:59 PM Anywhere on Earth (AoE)

The purpose of this programming assignment is to gain experience in using performance analysis tools for parallel programs. For this assignment, you will run an existing parallel code, LULESH and analyze its performance using HPCToolkit and Hatchet.

Downloading and building LULESH

You can get LULESH by cloning its git repository as follows:


        git clone https://github.com/LLNL/LULESH.git

You can use CMake to build LULESH on deepthought2 by following these steps:


        mkdir build
        cd build
        cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_CXX_FLAGS="-g -O3" -DMPI_CXX_COMPILER=`which mpicxx` -DWITH_OPENMP=Off -DWITH_SILO=Off ..
        make

This should produce an executable lulesh2.0 in the build directory.

Running LULESH

Lets say you want to run LULESH on 8 processes for 10 iterations/timesteps. This would be the mpirun line:


        mpirun -np 8 ./lulesh2.0 -i 10 -p

Using HPCToolkit and Hatchet

HPCToolkit is available on deepthought2 via the hpctoolkit/gcc module. You can use HPCtoolkit to collect profiling data for a parallel program in three steps.

Step I: Creating a hpcstruct file (used in Step III) from the executable
hpcstruct exe
This will create a file called exe.hpcstruct
Step II: Running the code (LULESH) with hpcrun:
mpirun -np <num_ranks> hpcrun -e WALLCLOCK@5000 ./exe <args>
This will generate a measurements directory.
Step III: Post-processing the measurements directory generated by hpcrun:
mpirun -np 1 hpcprof-mpi --metric-db=yes -S exe.hpcstruct -I <path_to_src> <measurements-directory>
This will generate a database directory.

Hatchet can be used to analyze the database directory generated by hpcprof-mpi using its from_hpctoolkit reader.

You can install Hatchet using pip install hatchet. I suggest using the development version of hatchet by cloning the git repository:


        git clone https://github.com/LLNL/hatchet.git

You can install hatchet on deepthought2 or your local computer by adding the hatchet directory to your PYTHONPATH and running install.sh.

Assignment Tasks

Task 1: You will run LULESH on 1, 8 and 27 MPI processes in the default (weak scaling) mode (with the parameters suggested above), and compare the performance of various executions. Identify the functions/statements that the code spends most of its time in. Identify the functions/code regions that scale poorly as you run on more processes.
Task 2: You will run LULESH on 1, 8 and 27 MPI processes with the additional argument -s 45 and compare the performance of these executions with those in the default mode. Identify the functions/code regions where the code spends disproportionately more time compared to the default mode in task 1.
Task 3: You will run LULESH on 1, 8, and 27 MPI processes in the strong scaling mode (use additional arguments, -s 45, -s 22, and -s 15 respectively), and compare the performance of various executions. Identify the functions/code regions that scale poorly as you run on more processes in this strong scaling mpde. Compare the results with the functions you identified in task 1.

What to Submit

You must submit the following files and no other files:

Python scripts that use hatchet for the analyses: task1.py, task2.py, and task3.py.
A report that describes what you did, and identifies the main bottlenecks in the source code in the various scenarios above.

You should put the code, and report in a single directory (named LastName-assign2), compress it to .tar.gz (LastName-assign2.tar.gz) and upload that to ELMS.

Tips

Don't follow the build and running instructions in this assignment blindly. The goal is for you to learn to compile and run parallel code, and learn how to use HPCToolkit and Hatchet.
Helpful resources: HPCToolkit user manual and Hatchet User Guide
If you have questions about using these tools or Python and pandas, try using Google first.

Grading

The project will be graded as follows:

Component	Percentage
Analysis 1	30
Analysis 2	30
Analysis 3	30
Writeup	10