Parallel Computing (CMSC416/CMSC616)

Assignment 3: OpenMP

Due: April 10, 2024 @ 11:59 PM Eastern Time

The purpose of this programming assignment is to gain experience in parallel programming on a cluster and OpenMP. You will start with working serial versions of four different programs, and add OpenMP directives to parallelize them. You should examine the loops in the code and figure out how to add OpenMP directives. The program will be run on a single compute node of zaratan. There are four different serial programs that solve different problems:

  1. problem1.cpp: computes the distance between the closest two points given a vector of points
  2. problem2.cpp: computes the number of edges in a directed graph, stored in the adjacency matrix representation
  3. problem3.cpp: computes the product of elements in vector with every odd-indexed element inverted
  4. problem4.cpp: computes the discrete fourier transform of a vector
Each problem takes two optional command line arguments (size and seed), is deterministic, and can be run with different problem sizes, and seeds. A sample Makefile can be found here.

Using OpenMP

To compile OpenMP code, we will use gcc version 9.4.0 (the default version on zaratan, which you can get by doing module load gcc on the zaratan login node), which nicely has OpenMP support built in. In general, you can compile problems in this assignment with:


        gcc -fopenmp -O2 -o problem1 problem1.cpp
        
The -fopenmp tells the compiler to, you guessed it, recognize OpenMP directives.

The environment variable OMP_NUM_THREADS sets the number of threads (and presumably cores) that will run the program. Set the value of this environment variable in the script you submit the job from. It defaults to using all available cores, and on a zaratan node that means 128 (and you might not want to do that). Also, in order to minimize performance variability, you want to add these to your job submission steps or script:


        #SBATCH --exclusive
#SBATCH --mem-bind=local
export OMP_PROCESSOR_BIND=true
A sample batch script can be found here.

Running and testing correctness of the problems

Each problem can run without any command line arguments to run a small test problem. This can be used to test correctness of your modified program. This is how you can test correctness for the default case:

  • Problems 1 through 3 print some value on standard out, which you can compare between the modified version and the original version (make two copies of the .cpp files).
  • For problem 4, we want you to make a copy of the dft function in the file, call it dft_omp, modify dft_omp to use OpenMP, and then change line 61 to call dft_omp instead. Problem 4 will automatically print if your modified function is correct.
We also want you to ensure that your modified code runs correctly in a variety of scenarios. So run the OpenMP versions with different OMP_NUM_THREADS=1, 2, 4, 8, 16, 32. In addition try these command line arguments for each problem (these should also be used for performance studies):
  • problem1: 16384
  • problem2: 8192
  • problem3: 67108864
  • problem4: 8192
Finally, you can also play with the optional second argument which changes the initial seed used to generate random numbers for filling the vectors in each problem. This can help you test correctness.

What to Submit

You must submit the following files and no other files:

  • A report that describes what you did.
  • In the same report, discuss the performance of running each problem with the larger sizes above (for 1, 2, 4, 8, 16, 32, and 64 threads on a single node).
  • modified problem1.cpp, problem2.cpp, problem3.cpp, problem4.cpp, files with OpenMP directives.
  • A Makefile to compile all four problems.
You should put the code files, Makefile and report in a single directory (named LastName-FirstName-assign3), compress it to .tar.gz (LastName-FirstName-assign3.tar.gz) and upload that to gradescope.

    double totalTime = 0.0; 

double start = omp_get_wtime();
... work to be timed ...
totalTime = omp_get_wtime() - start;

printf("TIME %.5f\n", totalTime);

Tips

  • zaratan primer
  • Use -g while debugging but -O2 when collecting performance numbers.
  • omp_get_wtime() example

Grading

The project will be graded as follows:

Component Percentage
Problem 1 Runs correctly on 4 and 16 threads 10 + 10
Problem 2 Runs correctly on 4 and 16 threads 10 + 10
Problem 3 Runs correctly on 4 and 16 threads 10 + 10
Problem 4 Runs correctly on 4 and 16 threads 15 + 15
Writeup 10