The HELL OF CACHES - improving caches

improving the performance of caches

Questions:

1. What are some ways to reduce cache misses?

2. What are some ways to reduce the penalty when we miss?

3. What are some ways to reduce the time required to check for a cache hit?

1. What are some ways to reduce cache misses?

method	purpose	pitfall
larger block size	blocks contain more surrounding data, more hits per block for code with high locality	block replacement consumes more memory bandwidth, too large a block size may increases misses by reducing number of blocks in a given cache size
higher associativity	reduces conflict discards	may increase hit check time
victim caches	tiny caches which cache only discards, they help minimize the impack of conflict-induced discards	benefit primarily for direct mapped caches, with many misses victim caches cannot help
psuedo-associtivity	attempts to simulate set-associativity with direct mapped cache performance	complex and slower in implementation
hardware prefetch	cache grabs extra blocks on a miss, under the assumption that subsequent blocks will be referenced soon, thereby avoiding a miss for each pre-fetched block	increases memory bandwidth consumed when servicing a miss, may interfere with demand (miss) fetches, lowering actual performance
software prefetch	compiler can instruct CPU to preload data into a register or touch a memory block (load into cache). two varieties -- faulting and nonfaulting -- which either cause virtual memory page faults or are no-ops if a page fault occurs (no point in pre-fetching through an expensive VM page fault)	adds complexity to compiler implementation, also compilers cannot do perfect prefetching. ties binary more closely to the underlying hardware implementation
compiler optimization	less code = fewer instruction misses, better data memory organization = fewer data (and conflict-induced) misses, etc.	obvious compiler complexity, but portable and beneficial for more reasons than just cache interaction

2. What are some ways to reduce the penalty when we miss?

method	purpose	pitfall
Priority Reads	service reads before servicing writes under the theory that reads are needed to resume processing	implemented by write buffers or similar techniques (see previous section)
Sub-block placement	breaks blocks up into smaller virtual blocks, reducing the overhead when only a portion of a block changes	may require search of block for locations and increase hit check time
Early restart	when fetching, as soon as the required word of a missed block arrives, resume processing	low benefit for small block sizes since the overhead of transfer is relatively small compared to the overhead for a single word, due to memory access delays and setup time
Critical word first	loads the required word before loading the block	same as early restart
Deeper cache heirarchy	adds another layer of cache below the current level, making this cache's apparent access to memory faster	complicated performance analysis, adds to hardware complexity and expense, not really worth it for small-size main memories
Non-blocking caches	services cache-hits while handling memory transfers in background so that cache misses do not block CPU	adds a great deal of complexity since the cache management must handle multiple outstanding memory requests while still servicing hits

3. What are some ways to reduce the time required to check for a cache hit?

method	purpose	pitfall
small, simple cache design	simple caches have easy hit detection (direct mapped or low associativity) due to reduced possibilities for block location, small caches reduce the possible number of blocks	obvious performance tradeoffs in terms of cache size and conflict-induced misses
virtually addressed caches	?	?
pipelined cache hardware	parallelize tag searches on associative caches, reducing the time necessary to search for a block	complicates tag memory design

body text Copyright 1997 Robert Rodgers graphics images Copyright 1997 Robert Rodgers HTML code copyright 1997 Elizabeth Rodgers and Robert Rodgers. The tables in section IV were derived from several graphs and tables in Hennessey and Patterson's COMPUTER ARCHITECTURE: A QUANTITATIVE APPROACH. Permission is granted to provide these pages in their original, unaltered form online for educational purposes. They may not be republished or included on a CDROM, disk or other media or with any published work without the express permission of the copyright holders (this includes FAQ books).