The purpose of this programming assignment is to gain experience in using performance analysis tools for parallel programs. There are two parts to this assignment. In Part I, you will run an existing parallel code, LULESH and collect performance data using HPCToolkit. In Part II, you will use performance data (gathered using another tool called Caliper) provided to you, and analyze this data using Hatchet.
You can get LULESH by cloning its git repository as follows:
git clone https://github.com/LLNL/LULESH.git
For this assignment, we will use an older version of gcc (9.4.0) and openmpi. You can get this by doing: module load openmpi/gcc/9.4.0
. If you get an error, unload openmpi/gcc/11.3.0
that you might have loaded for Assignment 2.
You can use CMake to build LULESH on zaratan by following these steps:
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_CXX_FLAGS="-g -O3" -DMPI_CXX_COMPILER=`which mpicxx` -DWITH_OPENMP=Off -DWITH_SILO=Off ..
make
This should produce an executable lulesh2.0
in the build directory.
Lets say you want to run LULESH on 8 processes for 10 iterations/timesteps. This would be the mpirun line:
mpirun -np 8 ./lulesh2.0 -i 10 -p
HPCToolkit can be loaded on zaratan via the hpctoolkit/gcc/9.4.0
module. You can use HPCtoolkit to collect profiling data for a parallel program in three steps.
mpirun -np <num_ranks> hpcrun ./exe <args>
hpcstruct <measurements_directory>
hpcprof <measurements-directory>
from_hpctoolkit
to analyze the database directory generated in Step III.
You can install hatchet on zaratan or your local computer using pip:
pip3 install hatchet
The above command should successfully install hatchet, provided that it runs without any errors. On zaratan, you may have to add --user
to the pip command.
In case this does not work OR you want to install from source, then you can do so by following the steps below. First clone the hatchet git repository and add the path of the directory where you cloned hatchet to your PYTHONPATH
:
git clone https://github.com/hatchet/hatchet
git checkout v1.4.0
export PYTHONPATH=<path-to-hatchet>:$PYTHONPATH
Next you can install the requirements using pip and then run install.sh inside the hatchet directory:
cd hatchet/
pip install -r requirements.txt
source install.sh
You can find four datasets from four different runs of LULESH at: lulesh-1core.json lulesh-8cores.json lulesh-27cores.json lulesh-64cores.json. These were gathered by running LULESH with 1, 8, 27, and 64 processes respectively. These profiles were gathered using Caliper, hence you can use from_caliper
in the hatchet API to read them. You will use these profiles/datasets for all the tasks below.
./problem1.py 3
func1_name 0.574
func2_name 0.522
func3_name 0.374
load_imbalance()
function on
the lulesh-64cores.json dataset, and given a command line parameter, X (value of X≥1),
identify the function that is X from the top (top starts at 1 and not 0) if the functions are sorted in
decreasing order of the imbalance. You should set the value of X in your Python
program from the first command line argument to your script. The only output
from the script should be the processes list generated by hatchet for this
function that have the most imbalance. What we will run to check
correctness (an example):./problem2.py 6
[60 15 0 51 3]
drop_index_levels()
on both, and then subtract the
8-process graphframe from the 64-process one. Identify N functions with the
largest time differences (NOT the absolute magnitude) between the two runs where N is a command line
argument. You should set the value of N in your Python program from the first
command line argument to your script. The only output from the Python script
should be N lines, where each line prints the function name, followed by a
space, and then different in time (exclusive) between the two runs for that
function. What we will run to check
correctness (an example):./problem3.py 4
func1_name 0.574
func2_name 0.552
func3_name 0.522
func4_name 0.374
You must submit the following files and no other files:
report-assign3.pdf
) that describes what you did, and which hatchet functions you found to be the most useful?LastName-FirstName-assign3
), compress it to .tar.gz (LastName-FirstName-assign3.tar.gz
) and upload that to gradescope.
Replace LastName
and FirstName
with your last and first name, respectively.
The project will be graded as follows:
Component | Percentage |
---|---|
Successful data collection | 30 |
Problem 1 correctness | 20 |
Problem 2 correctness | 20 |
Problem 3 correctness | 20 |
Writeup | 10 |