41,171 research outputs found
Jenga: Harnessing Heterogeneous Memories through Reconfigurable Cache Hierarchies
Conventional memory systems are organized as a rigid hierarchy, with multiple levels of progressively larger and slower memories. Hierarchy allows a simple, fixed design to benefit a wide range of applications, because working sets settle at the smallest (and fastest) level they fit in. However, rigid hierarchies also cause significant overheads, because each level adds latency and energy even when it does not capture the working set. In emerging systems with heterogeneous memory technologies such as stacked DRAM, these overheads often limit performance and efficiency. We propose Jenga, a reconfigurable cache hierarchy that avoids these pathologies and approaches the performance of a hierarchy optimized for each application. Jenga monitors application behavior and dynamically builds virtual cache hierarchies out of heterogeneous, distributed cache banks. Jenga uses simple hardware support and a novel software runtime to configure virtual cache hierarchies. On a 36-core CMP with a 1 GB stacked-DRAM cache, Jenga outperforms a combination of state-of-the-art techniques by 10% on average and by up to 36%, and does so while saving energy, improving system-wide energy-delay product by 29% on average and by up to 96%
Advanced Lease Caching
Since the dawn of computing, CPU performance has continually grown, buoyed by Moore\u27s Law. Execution speed for parallelizable programs in particular has massively increased with the now widespread employment of GPUs, TPUs, and FPGAs, capable of preforming hundreds of computations simultaneously, for data processing. A major bottleneck for further performance increases, which has impeded speedup of sequential programming in particular, is the processor memory performance gap. One of the approaches to address this block is improving cache management algorithms. Caching is transparent to software, but traditional caching algorithms forgo hardware-software collaboration. Previous work introduced the idea of assigning leases to cache blocks as a form of collaborative cache eviction policy and introduced two lease-caching algorithms, Compiler Lease of cAche Memory (CLAM) and Phased Reference Leasing (PRL), evaluating them over 7 benchmarks from the Polybench benchmark suite. This work evaluates CLAM and PRL over all thirty benchmarks of the Polybench suite for multiple dataset sizes. Additionally, to address the flaws CLAM and PRL, two new lease-caching algorithms have been developed: Scoped Hooked Eviction Lease (SHEL) and Cross-Scope Eviction Lease (C-SHEL). These algorithms are evaluated not just for a single-level cache, typically found in embedded systems, but also for a multi-level cache as exists in more high-performance systems including multi-core CPUs. The test system uses a RISCV architecture to run benchmarks. All four lease caching algorithms outperform the baseline Pseudo Least Recently Used (PLRU) policy at both levels of the cache hierarchy. Further, SHEL and C-SHEL display significant performance increases over PRL for certain benchmarks, demonstrating the value of scoped leasing in addressing complex reuse interval (RI) behavior
A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems
Recent technological advances have greatly improved the performance and
features of embedded systems. With the number of just mobile devices now
reaching nearly equal to the population of earth, embedded systems have truly
become ubiquitous. These trends, however, have also made the task of managing
their power consumption extremely challenging. In recent years, several
techniques have been proposed to address this issue. In this paper, we survey
the techniques for managing power consumption of embedded systems. We discuss
the need of power management and provide a classification of the techniques on
several important parameters to highlight their similarities and differences.
This paper is intended to help the researchers and application-developers in
gaining insights into the working of power management techniques and designing
even more efficient high-performance embedded systems of tomorrow
Energy Saving Techniques for Phase Change Memory (PCM)
In recent years, the energy consumption of computing systems has increased
and a large fraction of this energy is consumed in main memory. Towards this,
researchers have proposed use of non-volatile memory, such as phase change
memory (PCM), which has low read latency and power; and nearly zero leakage
power. However, the write latency and power of PCM are very high and this,
along with limited write endurance of PCM present significant challenges in
enabling wide-spread adoption of PCM. To address this, several
architecture-level techniques have been proposed. In this report, we review
several techniques to manage power consumption of PCM. We also classify these
techniques based on their characteristics to provide insights into them. The
aim of this work is encourage researchers to propose even better techniques for
improving energy efficiency of PCM based main memory.Comment: Survey, phase change RAM (PCRAM
Improving the Performance and Endurance of Persistent Memory with Loose-Ordering Consistency
Persistent memory provides high-performance data persistence at main memory.
Memory writes need to be performed in strict order to satisfy storage
consistency requirements and enable correct recovery from system crashes.
Unfortunately, adhering to such a strict order significantly degrades system
performance and persistent memory endurance. This paper introduces a new
mechanism, Loose-Ordering Consistency (LOC), that satisfies the ordering
requirements at significantly lower performance and endurance loss. LOC
consists of two key techniques. First, Eager Commit eliminates the need to
perform a persistent commit record write within a transaction. We do so by
ensuring that we can determine the status of all committed transactions during
recovery by storing necessary metadata information statically with blocks of
data written to memory. Second, Speculative Persistence relaxes the write
ordering between transactions by allowing writes to be speculatively written to
persistent memory. A speculative write is made visible to software only after
its associated transaction commits. To enable this, our mechanism supports the
tracking of committed transaction ID and multi-versioning in the CPU cache. Our
evaluations show that LOC reduces the average performance overhead of memory
persistence from 66.9% to 34.9% and the memory write traffic overhead from
17.1% to 3.4% on a variety of workloads.Comment: This paper has been accepted by IEEE Transactions on Parallel and
Distributed System
- …