CMSC216 HW11: Matrices and Memory Optimization
- Due: 11:59pm Mon 25-Nov-2024 on Gradescope
- Approximately 0.83% of total grade
CODE DISTRIBUTION: hw11-code.zip
- Download the code distribution
- See further setup instructions below
CHANGELOG:
1 Rationale
Basic understanding of how the memory system works to fulfill load/store requests is essential when one wishes to optimize program performance. Having an intuition on which styles of coding exploit cache memories will offer great performance benefits on most modern computing architectures.
This HW analyzes a simple set of routines that have different performance due to their memory access pattern. The final problem is to create a more efficient version of a routine which accesses memory in a fashion that exploits cache more effectively increasing performance.
Associated Reading / Preparation
Bryant and O'Hallaron: Ch 6 on The Memory Hierarchy. Ch 6.5 and 6.6 may be particularly helpful
Grading Policy
Credit for this HW is earned by taking the associated HW Quiz which is
linked under Gradescope
. The quiz will ask similar questions as
those that are present in the QUESTIONS.txt
file and those that
complete all answers in QUESTIONS.txt
should have no trouble with
the quiz.
Homework and Quizzes are open resource/open collaboration. You must submit your own work but you may freely discuss HW topics with other members of the class.
See the full policies in the course syllabus.
2 Codepack
The codepack for the HW contains the following files:
File | State | Description |
---|---|---|
Makefile |
Provided | Makefile to buil the matsums_main program |
matvec.h |
Provided | Header file defining some types and functions for |
matvec_util.c |
Provided | Utility functions to manipulate matrices/vectors |
matsums_funcs.c |
EDIT | Small "algorithms" to time arithmetic operations |
matsums_main.c |
EDIT | Main function that times different summing techinques |
QUESTIONS.txt |
EDIT | Questions to answer |
3 Overview and Background
This HW deals with matrices and vectors. In C these are typically 2D
arrays (matrices) and 1D arrays (vectors), though to increase
performance of the memory system, other approaches are sometimes
taken. Part of this lab is to examine the matrix_t
and vector_t
types provided as they may pertain to later assignments.
The lab focuses on analyzing two simple routines
sum_rows()
which sums the rows of a matrix into an appropriately sized vector.sum_cols()
which sums the columns of a matrix into an appropriately sized vector.
Similar to discussion in class, these two have different performance properties that can only be explained through knowledge of the memory hierarchy.
4 What to Understand
Ensure that you understand
- How different access patterns in memory can lead to significantly different speeds in code
- How in some cases, re-arranging code to access memory in a different pattern can dramatically improve performance
5 Questions
Analyze the files in the provided codepack and answer the questions
given in QUESTIONS.txt
.
_________________ HW 11 QUESTIONS _________________ Write your answers to the questions below directly in this text file to prepare for the associated quiz. Credit for the HW is earned by completing the associated online quiz on Gradescope. Timing Note =========== This code has been timed on GRACE to verify it behaves as expected in most cases but as with all timing code, system load and other circumstances can affect the actual timing results. If you strange times that seem to conflict with the expected answers, report it to staff to get help. PROBLEM 1: Code Overview ======================== (A) Vector and Matrix ~~~~~~~~~~~~~~~~~~~~~ Examine the header file `matvec.h' which gives type definitions and declares functions associated with a simple matrix and vector type. For the matrix type `matrix_t' with R rows and C columns, how is a 2D matrix actually laid out in memory? (B) Getting and Setting Elements ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For the `matrix_t' and `vector_t' types, convenient means to get and set elements is provided. This mechanism is used in the utility functions in `matvec_util.c' and defined in `matvec.h'. Describe how one would get element (R,C) of `matrix_t' or set it to value Z. Why are these mechanisms not functions? PROBLEM 2: Timing Rows vs Columns ================================= (A) row_sums vs col_sums Timing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Compile and run the `matsums_main' executable using the provided `Makefile' (typing `make' should suffice). Run the code on `csel-kh1250-NN.cselabs.umn.edu' to ensure you get the expected performance experience. Run this program with differing parameters which control the number of rows and columns in the matrix which is summed. Show some example runs with different parameters including on large square matrix with 16000 rows and 8000 columns. Note any speed differences between the two on large matrices. (B) row_sums vs col_sums speed ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Examine the source code for the functions `row_sums()' and `col_sums()' in the file `matsums_funcs.c'. Describe why the timing/speed differences you observed previously are occurring based on features of the source code you see and the layout of the `matrix_t' that is being summed. PROBLEM 3: opt_col_sums() ========================= Complete the function `opt_col_sums()' in file `matsums_funcs.c'. This function should have identical behavior to `col_sums()' (it sums the columns of the matrix into a provided array). However, it should be *optimized* so that it achieves performance near to that of the `row_sums()' function. To achieve this, re-arrange the loops to iterate as efficiently as possible with respect to the memory system. Feel free to ask course staff for hints on how to do this or do some online research. To time `opt_col_sums()', uncomment relevant blocks in the `matsums_main.c' program that are provided for it. Paste your source code and a copy of the timing results of running `matsums_main' on an 16000 by 8000 matrix.