Search CORE

16 research outputs found

L2C: Combining Lossy and Lossless Compression on Memory and I/O

Author: Arelakis Angelos
Eldst\ue5l-Ahrens Albin
Sourdis Ioannis
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2022
Field of study

In this paper we introduce L2C, a hybrid lossy/lossless compression scheme applicable both to the memory subsystem and I/O traffic of a processor chip. L2C employs general-purpose lossless compression and combines it with state of the art lossy compression to achieve compression ratios up to 16:1 and improve the utilization of chip\u27s bandwidth resources. Compressing memory traffic yields lower memory access time, improving system performance and energy efficiency. Compressing I/O traffic offers several benefits for resource-constrained systems, including more efficient storage and networking.We evaluate L2C as a memory compressor in simulation with a set of approximation-tolerant applications. L2C improves baseline execution time by an average of 50\%, and total system energy consumption by 16%. Compared to the lossy and lossless current state of the art memory compression approaches, L2C improves execution time by 9% and 26% respectively, and reduces system energy costs by 3% and 5%, respectively.I/O compression efficacy is evaluated using a set of real-life datasets. L2C achieves compression ratios of up to 10.4:1 for a single dataset and on average about 4:1, while introducing no more than 0.4% error

Chalmers Research

FlatPack: Flexible Compaction of Compressed Memory

Author: Arelakis Angelos
Eldst\ue5l-Ahrens Albin
Sourdis Ioannis
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2023
Field of study

The capacity and bandwidth of main memory is an increasingly important factor in computer system performance. Memory compression and compaction have been combined to increase effective capacity and reduce costly page faults. However, existing systems typically maintain compaction at the expense of bandwidth. One major cause of extra traffic in such systems is page overflows, which occur when data compressibility degrades and compressed pages must be reorganized. This paper introduces FlatPack, a novel approach to memory compaction which is able to mitigate this overhead by reorganizing compressed data dynamically with less data movement. Reorganization is carried out by an addition to the memory controller, without intervention from software. FlatPack is able to maintain memory capacity competitive with current state-of-the-art memory compression designs, while reducing mean memory traffic by up to 67%. This yields average improvements in performance and total system energy consumption over existing memory compression solutions of 31-46% and 11-25%, respectively. In total, FlatPack improves on baseline performance and energy consumption by 108% and 40%, respectively, in a single-core system, and 83% and 23%, respectively, in a multi-core system

Chalmers Research

Design Considerations of Value-aware Caches

Author: Arelakis Angelos
Publication venue
Publication date: 01/01/2013
Field of study

On-chip cache memories are instrumental in tackling several performance and energy issues facing contemporary and future microprocessor chip architectures. First, they are key to bridge the growing speed-gap between memory and processors. Second, as the bandwidth into the chip is not keeping up the pace with the growth in processing performance, on-chip caches are essential in keeping the bandwidth demands within the limits. Finally, since off-chip memory accesses consume a substantial amount of energy, larger on-chip caches can potentially bring down energy wastage for off-chip accesses. Hence, techniques to improve on-chip cache utilization are important.This thesis shows that value replication -- the same value is replicated in multiple memory locations -- is an important source to improve utilization of cache/memory capacity. The thesis establishes through experimentation that many applications exhibit a high value locality and when it is exploited by storing each unique memory value exactly once, compression factors beyond 16X can be achieved. The proposed cache compression techniques build on this opportunity by encoding replicated values. While cache compression techniques in the past manage to code frequent values densely, they trade off a high compression ratio for low decompression latency, thus missing opportunities to utilize on-chip cache capacity more effectively.The thesis further analyses design considerations when realising a practical value-aware cache that accommodates statistical-based compression and presents, for the first time, a detailed design-space exploration of statistical-based cache compression. It is shown that more aggressive, statistical-based compression approaches, such as Huffman coding, that have been excluded in the past due to the processing overhead for compression and decompression, are prime candidates for cache and memory compression. In this thesis, I find that, even though more processing-intensive decompression affects the cache-hit time of last-level caches, modern out-of-order cores can typically hide the decompression latency successfully. Moreover, the impact of statistics acquisition to generate new codewords is also low because value locality varies little over time so new encodings need to be generated rarely making it possible to off-load it to software routines. Interestingly, the high compression ratio obtained by statistical-based cache compression is shown to improve cache capacity by close to three times which for cache-intensive workloads results in significant performance gains (20% on average) and substantial energy savings (the saved energy may be even 10 times larger than the total energy overheads) by reducing the off-chip use

Chalmers Research

Statistical Compression Cache Designs

Author: Arelakis Angelos
Publication venue
Publication date: 01/01/2015
Field of study

On-chip caches are essential as they bridge the growing speed-gap between off-chip memory and processors. To this end, processing cores are sacrificed for more cache space in the chip\u27s real estate, possibly affecting the cache access time and power dissipation. An alternative to increase the effective cache capacity without enlarging its size is cache compression. However, the compression and decompression processes required add complexity and latency; especially decompression lies in the critical memory access path. Prior work focuses on methods that target lower decompression latency by sacrificing important gains in compressibility. On the other hand, this thesis focuses on cache designs that exploit more advanced compression methods, i.e., statistical compression.The thesis first contributes with an abstract value-aware cache model, which shows that applications often exhibit value locality, and establishes that ideally, by storing each appeared value exactly once, important compression opportunities open up. Motivated by this, the thesis proposes SC^2, a Huffman-based statistical compression cache design. The thesis tackles the problem of statistics acquisition by building a sampling mechanism in hardware. It finds that value locality is rather stable over long time-periods, hence code generation can be offloaded in software. Then it builds the support for compression and decompression in hardware, deals with practical issues such as cache space management, and finally makes a detailed exploration of statistical compression in the last-level cache. Unfortunately, this approach cannot be straightforwardly applied to data types that contain semantically well-defined data fields. Among such types, the thesis focuses on the common double-precision floating-point data and explores a different avenue to extract value locality by considering the different fields (sign, exponent and mantissa) in isolation. Contrary to prior observations, it is shown that the mantissa exhibits significant value locality if it is further partitioned. Then a novel statistical compression method, called FP-H, tailored for cache compression is proposed. Finally, the thesis makes the observation that none of the compressed cache designs, including state of the art, are hitherto always better than others. Hence the thesis establishes HyComp, a practical cache design that adopts hybrid compression for the first time where one out of data-type specific compression methods is selected through heuristics. HyComp offers robust compressibility across applications that manipulate diverse data types, without affecting decompression but only slightly impacting compression latency

Chalmers Research

Statistical Compression Cache Designs

Author: Arelakis Angelos
Publication venue: Chalmers University of Technology, Göteborg
Publication date: 01/01/2015
Field of study

On-chip caches are essential as they bridge the growing speed-gap between off-chip memory and processors. To this end, processing cores are sacrificed for more cache space in the chip's real estate, possibly affecting the cache access time and power dissipation. An alternative to increase the effective cache capacity without enlarging its size is cache compression. However, the compression and decompression processes required add complexity and latency; especially decompression lies in the critical memory access path. Prior work focuses on methods that target lower decompression latency by sacrificing important gains in compressibility. On the other hand, this thesis focuses on cache designs that exploit more advanced compression methods, i.e., statistical compression. The thesis first contributes with an abstract value-aware cache model, which shows that applications often exhibit value locality, and establishes that ideally, by storing each appeared value exactly once, important compression opportunities open up. Motivated by this, the thesis proposes SC^2, a Huffman-based statistical compression cache design. The thesis tackles the problem of statistics acquisition by building a sampling mechanism in hardware. It finds that value locality is rather stable over long time-periods, hence code generation can be offloaded in software. Then it builds the support for compression and decompression in hardware, deals with practical issues such as cache space management, and finally makes a detailed exploration of statistical compression in the last-level cache. Unfortunately, this approach cannot be straightforwardly applied to data types that contain semantically well-defined data fields. Among such types, the thesis focuses on the common double-precision floating-point data and explores a different avenue to extract value locality by considering the different fields (sign, exponent and mantissa) in isolation. Contrary to prior observations, it is shown that the mantissa exhibits significant value locality if it is further partitioned. Then a novel statistical compression method, called FP-H, tailored for cache compression is proposed. Finally, the thesis makes the observation that none of the compressed cache designs, including state of the art, are hitherto always better than others. Hence the thesis establishes HyComp, a practical cache design that adopts hybrid compression for the first time where one out of data-type specific compression methods is selected through heuristics. HyComp offers robust compressibility across applications that manipulate diverse data types, without affecting decompression but only slightly impacting compression latency

Chalmers Research

Chalmers Publication Library

A Case for a Value-Aware Cache

Author: Arelakis Angelos
Stenström Per
Publication venue
Publication date: 01/01/2014
Field of study

Replication of values causes poor utilization of on-chip cache memory resources. This paper addresses the question: How much cache resources can be theoretically and practically saved if value replication is eliminated? We introduce the concept of value-aware caches and show that a sixteen times smaller value-aware cache can yield the same miss rate as a conventional cache. We then make a case for a value-aware cache design using Huffman-based compression. Since the value set is rather stable across the execution of an application, one can afford to reconstruct the coding tree in software. The decompression latency is kept short by our proposed novel pipelined Huffman decoder that uses canonical codewords. While the (loose) upper-bound compression factor is 5.2X, we show that, by eliminating cache-block alignment restrictions, it is possible to achieve a compression factor of 3.4X for practical designs

Chalmers Research

Chalmers Publication Library

A Cache System and a Method of Operating a Cache

Author: Arelakis Angelos
Stenstr\uf6m Per
Publication venue
Publication date: 01/01/2013
Field of study

Chalmers Research

A Cache System and a Method of Operating a Cache

Author: Arelakis Angelos
Stenstr\uf6m Per
Publication venue
Publication date: 01/01/2013
Field of study

Chalmers Research

Chalmers Publication Library

SC2: A statistical compression cache scheme

Author: Arelakis Angelos
Stenstr\uf6m Per
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Low utilization of on-chip cache capacity limits performance and wastes energy because of the long latency, limited bandwidth, and energy consumption associated with off-chip memory accesses. Value replication is an important source of low capacity utilization. While prior cache compression techniques manage to code frequent values densely, they trade off a high compression ratio for low decompression latency, thus missing opportunities to utilize capacity more effectively. This paper presents, for the first time, a detailed design-space exploration of caches that utilize statistical compression. We show that more aggressive approaches like Huffman coding, which have been neglected in the past due to the high processing overhead for (de)compression, are suitable techniques for caches and memory. Based on our key observation that value locality varies little over time and across applications, we first demonstrate that the overhead of statistics acquisition for code generation is low because new encodings are needed rarely, making it possible to off-load it to software routines. We then show that the high compression ratio obtained by Huffman-coding makes it possible to utilize the performance benefits of 4X larger last-level caches with about 50% lower power consumption than such larger caches

Crossref

Chalmers Research