Fast Hit times via Way Prediction
• How to combine fast hit time of Direct Mapped and have the lower conflict misses of 2-way SA cache? • Way prediction: keep extra bits in cache to predict the "way," or block within the set, of next cache access.
-Multiplexor is set early to select desired block, only 1 tag comparison performed that clock cycle in parallel with reading the cache data -Miss ⇒ 1 st check other blocks for matches in next clock cycle • Accuracy ≈ 85% • Drawback: CPU pipeline is hard if hit takes 1 or 2 cycles 
6: Increasing Cache Bandwidth via Multiple Banks
• Rather than treat the cache as a single monolithic block, divide into independent banks that can support simultaneous accesses -E.g.,T1 ("Niagara") L2 has 4 banks • Banking works best when accesses naturally spread themselves across banks ⇒ mapping of addresses to banks affects behavior of memory system • Simple mapping that works well is "sequential interleaving"
-Spread block addresses sequentially across banks -E,g, if there 4 banks, Bank 0 has all blocks whose address modulo 4 is 0; bank 1 has all blocks whose address modulo 4 is 1; … 
Compiler Optimization vs. Memory Hierarchy Search
• Compiler tries to figure out memory hierarchy optimizations • New approach: "Auto-tuners" 1st run variations of program on computer to find best combinations of optimizations (blocking, padding, …) and algorithms, then produce C code to be compiled for that computer • "Auto-tuner" targeted to numerical method Quest for DRAM Performance
Fast Page mode
-Add timing signals that allow repeated accesses to row buffer without another row access time -Such a buffer comes naturally, as each array will buffer 1024 to 2048 bits for each access
Synchronous DRAM (SDRAM)
-Add a clock signal to DRAM interface, so that the repeated transfers would not bear overhead to synchronize with DRAM controller
Double Data Rate (DDR SDRAM)
-Transfer data on both the rising edge and falling edge of the DRAM clock signal ⇒ doubling the peak data rate -DDR2 lowers power by dropping the voltage from 2.5 to 1.8 volts + offers higher clock rates: up to 400 MHz -DDR3 drops to 1.5 volts + higher clock rates: up to 800 MHz 
Requirements of a Virtual Machine Monitor
• VMM must be at higher privilege level than guest VM, which generally run in user mode Xen changes for paravirtualization 
