A Formal Approach to Memory Access Optimization: Data Layout, Reorganization, and Near-Data Processing

Abstract

The memory system is a major bottleneck in achieving high performance and energy efficiency for various processing platforms. This thesis aims to improve memory performance and energy efficiency of data intensive applications through a two-pronged approach which combines a formalrepresentation framework and a hardware substrate that can efficiently reorganize data in memory. The proposed formal framework enables representing and systematically manipulating data layout formats, address mapping schemes, and memory access patterns through permutations to exploit the locality and parallelism in memory. Driven by the implications from the formal framework, this thesis presents the HAMLeT architecture for highly-concurrent, energy-efficient and low-overheaddata reorganization performed completely in memory. Although data reorganization simply relocatesdata in memory, it is costly on conventional systems mainly due to inefficient access patterns, limited data reuse, and roundtrip data traversal throughout the memory hierarchy. HAMLeT pursues a near-data processing approach exploiting the 3D-stacked DRAM technology. Integrated inthe logic layer, interfaced directly to the local controllers, it takes advantage of the internal fine-grainparallelism, high bandwidth and locality which are inaccessible otherwise. Its parallel streaming architecturecan extract high throughput from stringent power, area, and thermal budgets. The thesis evaluates the efficient data reorganization capability provided by HAMLeT through several fundamental use cases. First, it demonstrates software-transparent data reorganization performedin memory to improve the memory access. A proposed hardware monitoring determines inefficient memory usage and issues a data reorganization to adapt an optimized data layout and address mapping for the observed memory access patterns. This mechanism is performed transparently and does not require any changes to the user software—HAMLeT handles the remappingand its side effects completely in hardware. Second, HAMLeT provides an efficient substrate to explicitly reorganize data in memory. This gives an ability to offload and accelerate common data reorganization routines observed in high-performance computing libraries (e.g., matrix transpose, scatter/gather, permutation, pack/unpack, etc.). Third, explicitly performed data reorganization enablesconsidering the data layout and address mapping as a part of the algorithm design space. Exposing these memory characteristics to the algorithm design space creates opportunities for algorithm/ architecture co-design. Co-optimized computation flow, memory accesses, and data layout lead to new algorithms that are conventionally avoided. <br

    Similar works

    Full text

    thumbnail-image

    Available Versions