As cache hierarchies become deeper and the number of cores on a chip increases, managing caches becomes more impor-tant for performance and energy. However, current hard-ware cache management policies do not always adapt opti-mally to the applications behavior: e.g., caches may be pol-luted by data structures whose locality cannot be captured by the caches, and producer-consumer communication in-curs multiple round trips of coherence messages per cache line transferred. We propose load and store instructions that carry hints regarding into which cache(s) the accessed data should be placed. Our instructions allow software to convey locality information to the hardware, while incur-ring minimal hardware cost and not affecting correctness. Our instructions provide a 1.07 × speedup and a 1.24 × en-ergy efficiency boost, on average, according to simulations on a 64-core system with private L1 and L2 caches. With a large shared L3 cache added, the benefits increase, providing 1.33 × energy reduction on average
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.