1,184 research outputs found

    Lossy and Lossless Compression Techniques to Improve the Utilization of Memory Bandwidth and Capacity

    Get PDF
    Main memory is a critical resource in modern computer systems and is in increasing demand. An increasing number of on-chip cores and specialized accelerators improves the potential processing throughput but also calls for higher data rates and greater memory capacity. In addition, new emerging data-intensive applications further increase memory traffic and footprint. On the other hand, memory bandwidth is pin limited and power constrained and is therefore more difficult to scale. Memory capacity is limited by cost and energy considerations.This thesis proposes a variety of memory compression techniques as a means to reduce the memory bottleneck. These techniques target two separate problems in the memory hierarchy: memory bandwidth and memory capacity. In order to reduce transferred data volumes, lossy compression is applied which is able to reach more aggressive compression ratios. A reduction of off-chip memory traffic leads to reduced memory latency, which in turn improves the performance and energy efficiency of the system. To improve memory capacity, a novel approach to memory compaction is presented.The first part of this thesis introduces Approximate Value Reconstruction (AVR), which combines a low-complexity downsampling compressor with an LLC design able to co-locate compressed and uncompressed data. Two separate thresholds limit the error introduced by approximation. For applications that tolerate aggressive approximation in large fractions of their data, in a system with 1GB of 1600MHz DDR4 per core and 1MB of LLC space per core, AVR reduces memory traffic by up to 70%, execution time by up to 55%, and energy costs by up to 20% introducing at most 1.2% error in the application output.The second part of this thesis proposes Memory Squeeze (MemSZ), introducing a parallelized implementation of the more advanced Squeeze (SZ) compression method. Furthermore, MemSZ improves on the error limiting capability of AVR by keeping track of life-time accumulated error. An alternate memory compression architecture is also proposed, which utilizes 3D-stacked DRAM as a last-level cache. In a system with 1GB of 800MHz DDR4 per core and 1MB of LLC space per core, MemSZ improves execution time, energy and memory traffic over AVR by up to 15%, 9%, and 64%, respectively.The third part of the thesis describes L2C, a hybrid lossy and lossless memory compression scheme. L2C applies lossy compression to approximable data, and falls back to lossless if an error threshold is exceeded. In a system with 4GB of 800MHz DDR4 per core and 1MB of LLC space per core, L2C improves on the performance of MemSZ by 9%, and energy consumption by 3%.The fourth and final contribution is FlatPack, a novel memory compaction scheme. FlatPack is able to reduce the traffic overhead compared to other memory compaction systems, thus retaining the bandwidth benefits of compression. Furthermore, FlatPack is flexible to changes in block compressibility both over time and between adjacent blocks. When available memory corresponds to 50% of the application footprint, in a system with 4GB of 800MHz DDR4 per core and 1MB of LLC space per core, FlatPack increases system performance compared to current state-of-the-art designs by 36%, while reducing system energy consumption by 12%

    Yet Another Compressed Cache: a Low Cost Yet Effective Compressed Cache

    Get PDF
    Cache memories play a critical role in bridging the latency, bandwidth, and energy gaps between cores and off-chip memory. However, caches frequently consume a significant fraction of a multicore chip’s area, and thus account for a significant fraction of its cost. Compression has the potential to improve the effective capacity of a cache, providing the performance and energy benefits of a larger cache while using less area. The design of a compressed cache must address two important issues: i) a low-latency, low-overhead compression algorithm that can represent a fixed-size cache block using fewer bits and ii) a cache organization that can efficiently store the resulting variable-size compressed blocks. This paper focuses on the latter issue. In this paper, we propose YACC (Yet Another Compressed Cache), a new compressed cache design that uses super-blocks to reduce tag overheads and variable-size blocks to reduce internal fragmentation, but eliminates two major sources of complexity in previous work—decoupled tag-data mapping and address skewing. YACC’s cache layout is similar to conventional caches, eliminating the back-pointers used to maintain a decoupled tag-data mapping and the extra decoders used to implement skewed associativity. An additional advantage of YACC is that it enables modern replacement mechanisms, such as RRIP. For our benchmark set, YACC performs comparably to the recently-proposed Skewed Compressed Cache (SCC) ‎[Sardashti et al. 2014], but with a simpler, more area efficient design without the complexity and overheads of skewing. Compared to a conventional uncompressed 8MB LLC, YACC improves performance by on average 8% and up to 26%, and reduces total energy by on average 6% and up to 20%. An 8MB YACC achieves approximately the same performance and energy improvements as a 16MB conventional cache at a much smaller silicon footprint, with 1.6% higher area than an 8MB conventional cach
    • …
    corecore