Memory: performance
Memory hierarchy can have important effect on performance
Inner loop of matrix multiply:
for (i = 0; i < 500; i++)
for (j = 0; j < 500; j++)
for (k = 0; k < 500; k++)
x[i][j] = x[i][j] + y[i][k] * z[k][j];
Running time on Silicon Graphics system with MIPS R4000 processor
and 1MB secondary cache: 77.2 seconds
If loop order reversed so i is innermost: 44.2 seconds
Only difference: order of accessing data
Other compiler optimizations: less than 10 seconds!