Memory: performance | |||||||||||||
Memory hierarchy can have important effect on performance | |||||||||||||
Inner loop of matrix multiply: | |||||||||||||
for (i = 0; i < 500; i++) | |||||||||||||
for (j = 0; j < 500; j++) | |||||||||||||
for (k = 0; k < 500; k++) | |||||||||||||
x[i][j] = x[i][j] + y[i][k] * z[k][j]; | |||||||||||||
Running time on Silicon Graphics system with MIPS R4000 processor | |||||||||||||
and 1MB secondary cache: 77.2 seconds | |||||||||||||
If loop order reversed so i is innermost: 44.2 seconds | |||||||||||||
Only difference: order of accessing data | |||||||||||||
Other compiler optimizations: less than 10 seconds! | |||||||||||||