Parallel Computing (CMSC416/CMSC616)

Assignment 2: Performance Tools

Due: March 6 8, 2024 @ 11:59 PM Eastern Time

The purpose of this programming assignment is to gain experience in using performance analysis tools for parallel programs. There are two parts to this assignment. In Part I, you will run an existing parallel code, LULESH and collect performance data using HPCToolkit. In Part II, you will use performance data (gathered using another tool called Caliper) provided to you, and analyze this data using Hatchet.

Part I: Recording performance data

Downloading and building LULESH

You can get LULESH by cloning its git repository as follows (make sure the mpi module is loaded):
git clone https://github.com/LLNL/LULESH.git

You can use CMake to build LULESH on zaratan by following these steps:


        mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_CXX_FLAGS="-g -O3" -DMPI_CXX_COMPILER=`which mpicxx` -DWITH_OPENMP=Off -DWITH_SILO=Off ..
make
This should produce an executable lulesh2.0 in the build directory.

Running LULESH

Lets say you want to run LULESH on 8 processes for 10 iterations/timesteps. This would be the mpirun line:
mpirun -np 8 ./lulesh2.0 -i 10 -p

Using HPCToolkit

HPCToolkit is available on zaratan via the hpctoolkit/gcc module. You can use HPCtoolkit to collect profiling data for a parallel program in three steps.

  1. Step I: Running the code (LULESH) with hpcrun (in a batch job):
    mpirun -np <num_ranks> hpcrun ./exe <args>
    This will generate a measurements directory.
  2. Step II: First post-processing step on the measurements directory (this can be run on the login node)
    hpcstruct <measurements_directory>
  3. Step III: Second post-processing step on the measurements directory (this can be run on the login node)
    hpcprof <measurements-directory>
    This will generate a database directory.
You can use hpcviewer or Hatchet with from_hpctoolkit to analyze the database directory generated in Step III.

Part II: Analyzing performance data using Hatchet

Installing Hatchet

You can install hatchet on zaratan or your local computer using pip:
pip3 install hatchet
The above command should successfully install hatchet, provided that it runs without any errors.

In case this does not work OR you want to install from source, then you can do so by following the steps below. First clone the hatchet git repository and add the path of the directory where you cloned hatchet to your PYTHONPATH:


        git clone https://github.com/hatchet/hatchet
export PYTHONPATH=<path-to-hatchet>:$PYTHONPATH
Next you can install the requirements using pip and then run install.sh inside the hatchet directory:

        cd hatchet/
pip install -r requirements.txt
source install.sh

Datasets

You can find four datasets from four different runs of LULESH at: lulesh-1core.json lulesh-8cores.json lulesh-27cores.json lulesh-64cores.json. These were gathered by running LULESH with 1, 8, 27, and 64 processes respectively. These profiles were gathered using Caliper, hence you can use from_caliper in the hatchet API to read them.

Analysis Tasks

  1. Problem 1: Analyze lulesh-1core.json and identify the top N functions where the code spends the largest amounts of (exclusive) time. You should set the value of N in your Python program from the first command line argument to your script. The only output from the Python script should be N lines, where each line prints the function name, followed by a space, and then time (exclusive) spent in that function. What we will run to check correctness (an example):
    ./problem1.py 3
    Sample output below:
    
                func1_name 0.574
    func2_name 0.522
    func3_name 0.374
  2. Problem 2: Use the load_imbalance() function on the lulesh-64cores.json dataset, and given a command line parameter, X (value of X≥1), identify the function that is X from the top if the functions are sorted in decreasing order of the imbalance. You should set the value of X in your Python program from the first command line argument to your script. The only output from the script should be the processes list generated by hatchet for this function that have the most imbalance. What we will run to check correctness (an example):
    ./problem2.py 6
    Sample output below:
    
                [60 15  0 51  3]
                
  3. Problem 3: Created two graphframes for the 8 and 64 process case, use drop_index_levels() on both, and then subtract the 8-process graphframe from the 64-process one. Identify N functions with the largest time differences (NOT the absolute magnitude) between the two runs where N is a command line argument. You should set the value of N in your Python program from the first command line argument to your script. The only output from the Python script should be N lines, where each line prints the function name, followed by a space, and then different in time (exclusive) between the two runs for that function. What we will run to check correctness (an example):
    ./problem3.py 4
    Sample output below:
    
                func1_name 0.574
    func2_name 0.552
    func3_name 0.522
    func4_name 0.374

What to Submit

You must submit the following files and no other files:

  • Database directory generated on 8 processes for Part 1 renamed to lulesh-8processes.
  • Python scripts that use hatchet for the analyses: problem1.py, problem2.py, and problem3.py.
  • A report that describes what you did, and which hatchet functions you found to be the most useful?
You should put the dataset, 3 Python scripts, and report in a single directory (named LastName-FirstName-assign2), compress it to .tar.gz (LastName-FirstName-assign2.tar.gz) and upload that to gradescope. Replace LastName and FirstName with your last and first name, respectively.

Tips

Grading

The project will be graded as follows:

Component Percentage
Successful data collection 30
Problem 1 correctness 20
Problem 2 correctness 20
Problem 3 correctness 20
Writeup 10