41,171 research outputs found

    Jenga: Harnessing Heterogeneous Memories through Reconfigurable Cache Hierarchies

    Get PDF
    Conventional memory systems are organized as a rigid hierarchy, with multiple levels of progressively larger and slower memories. Hierarchy allows a simple, fixed design to benefit a wide range of applications, because working sets settle at the smallest (and fastest) level they fit in. However, rigid hierarchies also cause significant overheads, because each level adds latency and energy even when it does not capture the working set. In emerging systems with heterogeneous memory technologies such as stacked DRAM, these overheads often limit performance and efficiency. We propose Jenga, a reconfigurable cache hierarchy that avoids these pathologies and approaches the performance of a hierarchy optimized for each application. Jenga monitors application behavior and dynamically builds virtual cache hierarchies out of heterogeneous, distributed cache banks. Jenga uses simple hardware support and a novel software runtime to configure virtual cache hierarchies. On a 36-core CMP with a 1 GB stacked-DRAM cache, Jenga outperforms a combination of state-of-the-art techniques by 10% on average and by up to 36%, and does so while saving energy, improving system-wide energy-delay product by 29% on average and by up to 96%

    Advanced Lease Caching

    Get PDF
    Since the dawn of computing, CPU performance has continually grown, buoyed by Moore\u27s Law. Execution speed for parallelizable programs in particular has massively increased with the now widespread employment of GPUs, TPUs, and FPGAs, capable of preforming hundreds of computations simultaneously, for data processing. A major bottleneck for further performance increases, which has impeded speedup of sequential programming in particular, is the processor memory performance gap. One of the approaches to address this block is improving cache management algorithms. Caching is transparent to software, but traditional caching algorithms forgo hardware-software collaboration. Previous work introduced the idea of assigning leases to cache blocks as a form of collaborative cache eviction policy and introduced two lease-caching algorithms, Compiler Lease of cAche Memory (CLAM) and Phased Reference Leasing (PRL), evaluating them over 7 benchmarks from the Polybench benchmark suite. This work evaluates CLAM and PRL over all thirty benchmarks of the Polybench suite for multiple dataset sizes. Additionally, to address the flaws CLAM and PRL, two new lease-caching algorithms have been developed: Scoped Hooked Eviction Lease (SHEL) and Cross-Scope Eviction Lease (C-SHEL). These algorithms are evaluated not just for a single-level cache, typically found in embedded systems, but also for a multi-level cache as exists in more high-performance systems including multi-core CPUs. The test system uses a RISCV architecture to run benchmarks. All four lease caching algorithms outperform the baseline Pseudo Least Recently Used (PLRU) policy at both levels of the cache hierarchy. Further, SHEL and C-SHEL display significant performance increases over PRL for certain benchmarks, demonstrating the value of scoped leasing in addressing complex reuse interval (RI) behavior

    A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

    Full text link
    Recent technological advances have greatly improved the performance and features of embedded systems. With the number of just mobile devices now reaching nearly equal to the population of earth, embedded systems have truly become ubiquitous. These trends, however, have also made the task of managing their power consumption extremely challenging. In recent years, several techniques have been proposed to address this issue. In this paper, we survey the techniques for managing power consumption of embedded systems. We discuss the need of power management and provide a classification of the techniques on several important parameters to highlight their similarities and differences. This paper is intended to help the researchers and application-developers in gaining insights into the working of power management techniques and designing even more efficient high-performance embedded systems of tomorrow

    Energy Saving Techniques for Phase Change Memory (PCM)

    Full text link
    In recent years, the energy consumption of computing systems has increased and a large fraction of this energy is consumed in main memory. Towards this, researchers have proposed use of non-volatile memory, such as phase change memory (PCM), which has low read latency and power; and nearly zero leakage power. However, the write latency and power of PCM are very high and this, along with limited write endurance of PCM present significant challenges in enabling wide-spread adoption of PCM. To address this, several architecture-level techniques have been proposed. In this report, we review several techniques to manage power consumption of PCM. We also classify these techniques based on their characteristics to provide insights into them. The aim of this work is encourage researchers to propose even better techniques for improving energy efficiency of PCM based main memory.Comment: Survey, phase change RAM (PCRAM

    Improving the Performance and Endurance of Persistent Memory with Loose-Ordering Consistency

    Full text link
    Persistent memory provides high-performance data persistence at main memory. Memory writes need to be performed in strict order to satisfy storage consistency requirements and enable correct recovery from system crashes. Unfortunately, adhering to such a strict order significantly degrades system performance and persistent memory endurance. This paper introduces a new mechanism, Loose-Ordering Consistency (LOC), that satisfies the ordering requirements at significantly lower performance and endurance loss. LOC consists of two key techniques. First, Eager Commit eliminates the need to perform a persistent commit record write within a transaction. We do so by ensuring that we can determine the status of all committed transactions during recovery by storing necessary metadata information statically with blocks of data written to memory. Second, Speculative Persistence relaxes the write ordering between transactions by allowing writes to be speculatively written to persistent memory. A speculative write is made visible to software only after its associated transaction commits. To enable this, our mechanism supports the tracking of committed transaction ID and multi-versioning in the CPU cache. Our evaluations show that LOC reduces the average performance overhead of memory persistence from 66.9% to 34.9% and the memory write traffic overhead from 17.1% to 3.4% on a variety of workloads.Comment: This paper has been accepted by IEEE Transactions on Parallel and Distributed System
    • …
    corecore