Leveraging Compiler Alias Analysis To Free Accelerators from Load-Store Queues

Vedula, Naveen

unknown

Leveraging Compiler Alias Analysis To Free Accelerators from Load-Store Queues

Authors: Naveen Vedula
Publication date: 6 December 2016
Publisher

Abstract

Hardware accelerators are an energy efficient alternative to general purpose processors for specific program regions. They have relied on the compiler to extract instruction level parallelism but may waste significant energy in memory disambiguation and discovering memory level parallelism (MLP). Currently, accelerators either i) Define the problem away, and rely on massively parallel programming models [1, 48] to extract MLP. ii) Reuse the Out of Order (OoO) processor [7, 28], and rely on power hungry load-store queues (LSQs) for memory disambiguation, or iii) Serialize – some accelerators [47] focus on program regions where MLP is not important and simply serialize memory operations. We present NACHOS, a compiler assisted energy efficient approach to memory disambiguation, which completely eliminates the need for an LSQ. NACHOS classifies memory operations pairwise into those that don’t alias (i.e., independent memory operations), must alias (i.e., ordering is required between memory operations), and may alias (i.e., compiler is unsure). To enforce program order between must alias memory operations, the compiler inserts ordering edges that are enforced as def-use data dependencies. When the compiler is unsure (i.e., may alias) about a pair of memory operations, the hardware checks if they are independent. We demonstrate that compiler alias analysis with additional refinement can achieve high accuracy for hardware accelerated regions. In our workload suite comprising of SPEC2k, SPEC2k6, and PARSEC workloads; Across 15 applications NACHOS imposes no energy overhead over the function units (i.e., compiler resolves all dependencies), and in another 12 applications NACHOS consumes =17% of function unit energy (max: 53% in povray). Overall NACHOS achieves performance similar to an optimized LSQ and adds an overhead equal to 2.3X of compute energy

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Simon Fraser University Institutional Repository

oai:summit.sfu.ca:16882

Last time updated on 14/04/2017