The purpose of this programming assignment is to gain experience in parallel programming on a cluster and MPI. For this assignment you have to write a parallel implementation of a program to simulate the Game of Life.
The game of life simulates simple cellular automata. The game is played on a rectangular board containing cells. At the start, some of the cells are occupied, the rest are empty. The game consists of constructing successive generations of the board. The rules for constructing the next generation from the previous one are:
Your program should read in a file containing the coordinates of the initial cells. Sample files are located here: life.data.1 and life.data.2. You can also find many other sample patterns on the web (use your favorite search engine on "game of life" and/or "Conway").
Your program should take four command line arguments: the name of the data file, the number of generations to iterate, X_limit, and Y_limit.
To be more specific, the command line of your program should be:
./life <input file name> <# of generations> <X_limit> <Y_limit>
The number of processes the program will run on is specified as part of the mpirun command with the -np argument.
mpirun -np <number of processes>./life <input file name> <# of generations> <X_limit> <Y_limit>
Your program should print out one line (containing the x coordinate, a space, and then the y coordinate) for each occupied cell at the end of the last iteration. The output should go to standard output, and no additional output should be sent to standard output. Sample output files are available:
Figure out how you will decompose the problem for parallel execution. Remember that MPI (at least the OpenMPI implementation) does not always have great communication performance and so you will want to make message passing infrequent. Also, you will need to be concerned about load balancing. To learn about decomposing the problem in different ways, you must generate two parallel versions of the program, one that uses a 1D decomposition (rows or columns) and one that uses a 2D decomposition (both rows and columns).
Once you have decided how to decompose the problem, write the sequential version first.
You must submit the sequential and both parallel versions of your program (please use these file names: serial.<ext>, parallel-1d.<ext>, and parallel-2d.<ext>, where <ext> depends on the language used for implementation). Also submit the times to run the sequential and parallel versions on the input file final.data (for 1, 2, 4, 8, 16 and 32 processes), running on a 500x500 board for 500 iterations. Since the cluster you run on has 20 cores per node, you must time your code running on two different numbers of cores per node (8 and 16) to see the performance effects. In total, you will be running the program 12 times for 6 process counts X 2 cores/node settings.
You also must submit a short report about the results (1-2 pages) that explains:
If you want to try a bigger board, to see if you can get better speedups with more processes, try running on the input file life.data.1000x1000.
The project will be graded as follows:
Component | Percentage |
---|---|
Runs correctly with 1 process | 15 |
Runs correctly with 32 processes | 40 (20% each decomposition) |
Performance with 1 process | 10 |
Speedup of parallel versions | 20 (10% each decomposition) |
Writeup | 15 |