402 research outputs found

    Secure, Reliable, and Energy-efficient Phase Change Main Memory

    Get PDF
    Recent trends in supercomputing, shared cloud computing, and “big data” applications are placing ever-greater demands on computing systems and their memories. Such applications can readily saturate memory bandwidth, and often operate on a working set which exceeds the capacity of modern DRAM packages. Unfortunately, this demand is not matched by DRAM technology development. As Moore’s Law slows and Dennard Scaling stops, further density improvements in DRAM and the underlying semiconductor devices are difficult [1]. In anticipation of this limitation, researchers have pursued emerging memory technologies that promise higher density than conventional DRAM devices. One such technology in phase-change memory (PCM) is especially desirable due to its increased density relative to DRAM. However, this nascent memory has outstanding challenges to overcome before it is viable as a DRAM replacement. PCM devices have limited write endurance, and can consume more energy than their DRAM counterparts, necessitating careful control of how and how often they are written. A second challenge is the non-volatile nature of PCM devices; many applications rely on the volatility of DRAM to protect security critical applications and operating system address space between accesses and power cycles. An obvious solution is to encrypt the memory, but the effective randomization of data is at odds with techniques which reduce writes to the underlying memory. This body of work presents three contributions for addressing all challenges simultaneously under the assumption that encryption is required. Using an encryption and encoding technique called CASTLE & TOWERs, PCM can be employed as main memory with up to 30× improvement in device lifetime while opportunistically reducing dynamic energy. A second technique called MACE marries encoding and traditional error-correction schemes providing up to 2.6× improvement in device lifetime alongside a whole-lifetime energy evaluation framework to guide system design. Finally, an architecture called WINDU is presented which supports the application of encoding for an emerging encryption standard with an eye on energy efficiency. Together, these techniques advance the state-of-the-art, and offer a significant step toward the adoption of PCM as a main memory

    DESIGN AND TEST OF DIGITAL CIRCUITS AND SYSTEMS USING CMOS AND EMERGING RESISTIVE DEVICES

    Get PDF
    The memristor is an emerging nano-device. Low power operation, high density, scalability, non-volatility, and compatibility with CMOS Technology have made it a promising technology for memory, Boolean implementation, computing, and logic systems. This dissertation focuses on testing and design of such applications. In particular, we investigate on testing of memristor-based memories, design of memristive implementation of Boolean functions, and reliability and design of neuromorphic computing such as neural network. In addition, we show how to modify threshold logic gates to implement more functions. Although memristor is a promising emerging technology but is prone to defects due to uncertainties in nanoscale fabrication. Fast March tests are proposed in Chapter 2 that benefit from fast write operations. The test application time is reduced significantly while simultaneously reducing the average test energy per cell. Experimental evaluation in 45 nm technology show a speed-up of approximately 70% with a decrease in energy by approximately 40%. DfT schemes are proposed to implement the new test methods. In Chapter 3, an Integer Linear Programming based framework to identify current-mode threshold logic functions is presented. It is shown that threshold logic functions can be implemented in CMOS-based current mode logic with reduced transistor count when the input weights are not restricted to be integers. Experimental results show that many more functions can be implemented with predetermined hardware overhead, and the hardware requirement of a large percentage of existing threshold functions is reduced when comparing to the traditional CMOS-based threshold logic implementation. In Chapter 4, a new method to implement threshold logic functions using memristors is presented. This method benefits from the high range of memristor’s resistivity which is used to define different weight values, and reduces significantly the transistor count. The proposed approach implements many more functions as threshold logic gates when comparing to existing implementations. Experimental results in 45 nm technology show that the proposed memristive approach implements threshold logic gates with less area and power consumption. Finally, Chapter 5 focuses on current-based designs for neural networks. CMOS aging impacts the total synaptic current and this impacts the accuracy. Chapter 5 introduces an enhanced memristive crossbar array (MCA) based analog neural network architecture to improve reliability due to the aging effect. A built-in current-based calibration circuit is introduced to restore the total synaptic current. The calibration circuit is a current sensor that receives the ideal reference current for non-aged column and restores the reduced sensed current at each column to the ideal value. Experimental results show that the proposed approach restores the currents with less than 1% precision, and the area overhead is negligible

    Memory Management for Emerging Memory Technologies

    Get PDF
    The Memory Wall, or the gap between CPU speed and main memory latency, is ever increasing. The latency of Dynamic Random-Access Memory (DRAM) is now of the order of hundreds of CPU cycles. Additionally, the DRAM main memory is experiencing power, performance and capacity constraints that limit process technology scaling. On the other hand, the workloads running on such systems are themselves changing due to virtualization and cloud computing demanding more performance of the data centers. Not only do these workloads have larger working set sizes, but they are also changing the way memory gets used, resulting in higher sharing and increased bandwidth demands. New Non-Volatile Memory technologies (NVM) are emerging as an answer to the current main memory issues. This thesis looks at memory management issues as the emerging memory technologies get integrated into the memory hierarchy. We consider the problems at various levels in the memory hierarchy, including sharing of CPU LLC, traffic management to future non-volatile memories behind the LLC, and extending main memory through the employment of NVM. The first solution we propose is “Adaptive Replacement and Insertion" (ARI), an adaptive approach to last-level CPU cache management, optimizing the cache miss rate and writeback rate simultaneously. Our specific focus is to reduce writebacks as much as possible while maintaining or improving miss rate relative to conventional LRU replacement policy, with minimal hardware overhead. ARI reduces writebacks on benchmarks from SPEC2006 suite on average by 32.9% while also decreasing misses on average by 4.7%. In a PCM based memory system, this decreases energy consumption by 23% compared to LRU and provides a 49% lifetime improvement beyond what is possible with randomized wear-leveling. Our second proposal is “Variable-Timeslice Thread Scheduling" (VATS), an OS kernel-level approach to CPU cache sharing. With modern, large, last-level caches (LLC), the time to fill the LLC is greater than the OS scheduling window. As a result, when a thread aggressively thrashes the LLC by replacing much of the data in it, another thread may not be able to recover its working set before being rescheduled. We isolate the threads in time by increasing their allotted time quanta, and allowing larger periods of time between interfering threads. Our approach, compared to conventional scheduling, mitigates up to 100% of the performance loss caused by CPU LLC interference. The system throughput is boosted by up to 15%. As an unconventional approach to utilizing emerging memory technologies, we present a Ternary Content-Addressable Memory (TCAM) design with Flash transistors. TCAM is successfully used in network routing but can also be utilized in the OS Virtual Memory applications. Based on our layout and circuit simulation experiments, we conclude that our FTCAM block achieves an area improvement of 7.9× and a power improvement of 1.64× compared to a CMOS approach. In order to lower the cost of Main Memory in systems with huge memory demand, it is becoming practical to extend the DRAM in the system with the less-expensive NVMe Flash, for a much lower system cost. However, given the relatively high Flash devices access latency, naively using them as main memory leads to serious performance degradation. We propose OSVPP, a software-only, OS swap-based page prefetching scheme for managing such hybrid DRAM + NVM systems. We show that it is possible to gain about 50% of the lost performance due to swapping into the NVM and thus enable the utilization of such hybrid systems for memory-hungry applications, lowering the memory cost while keeping the performance comparable to the DRAM-only system

    Memory Management for Emerging Memory Technologies

    Get PDF
    The Memory Wall, or the gap between CPU speed and main memory latency, is ever increasing. The latency of Dynamic Random-Access Memory (DRAM) is now of the order of hundreds of CPU cycles. Additionally, the DRAM main memory is experiencing power, performance and capacity constraints that limit process technology scaling. On the other hand, the workloads running on such systems are themselves changing due to virtualization and cloud computing demanding more performance of the data centers. Not only do these workloads have larger working set sizes, but they are also changing the way memory gets used, resulting in higher sharing and increased bandwidth demands. New Non-Volatile Memory technologies (NVM) are emerging as an answer to the current main memory issues. This thesis looks at memory management issues as the emerging memory technologies get integrated into the memory hierarchy. We consider the problems at various levels in the memory hierarchy, including sharing of CPU LLC, traffic management to future non-volatile memories behind the LLC, and extending main memory through the employment of NVM. The first solution we propose is “Adaptive Replacement and Insertion" (ARI), an adaptive approach to last-level CPU cache management, optimizing the cache miss rate and writeback rate simultaneously. Our specific focus is to reduce writebacks as much as possible while maintaining or improving miss rate relative to conventional LRU replacement policy, with minimal hardware overhead. ARI reduces writebacks on benchmarks from SPEC2006 suite on average by 32.9% while also decreasing misses on average by 4.7%. In a PCM based memory system, this decreases energy consumption by 23% compared to LRU and provides a 49% lifetime improvement beyond what is possible with randomized wear-leveling. Our second proposal is “Variable-Timeslice Thread Scheduling" (VATS), an OS kernel-level approach to CPU cache sharing. With modern, large, last-level caches (LLC), the time to fill the LLC is greater than the OS scheduling window. As a result, when a thread aggressively thrashes the LLC by replacing much of the data in it, another thread may not be able to recover its working set before being rescheduled. We isolate the threads in time by increasing their allotted time quanta, and allowing larger periods of time between interfering threads. Our approach, compared to conventional scheduling, mitigates up to 100% of the performance loss caused by CPU LLC interference. The system throughput is boosted by up to 15%. As an unconventional approach to utilizing emerging memory technologies, we present a Ternary Content-Addressable Memory (TCAM) design with Flash transistors. TCAM is successfully used in network routing but can also be utilized in the OS Virtual Memory applications. Based on our layout and circuit simulation experiments, we conclude that our FTCAM block achieves an area improvement of 7.9× and a power improvement of 1.64× compared to a CMOS approach. In order to lower the cost of Main Memory in systems with huge memory demand, it is becoming practical to extend the DRAM in the system with the less-expensive NVMe Flash, for a much lower system cost. However, given the relatively high Flash devices access latency, naively using them as main memory leads to serious performance degradation. We propose OSVPP, a software-only, OS swap-based page prefetching scheme for managing such hybrid DRAM + NVM systems. We show that it is possible to gain about 50% of the lost performance due to swapping into the NVM and thus enable the utilization of such hybrid systems for memory-hungry applications, lowering the memory cost while keeping the performance comparable to the DRAM-only system

    Achieving Reliable and Sustainable Next-Generation Memories

    Get PDF
    Conventional memory technology scaling has introduced reliability challenges due to dysfunctional, improperly formed cells and crosstalk from increased cell proximity. Furthermore, as the manufacturing effort becomes increasingly complex due to these deeply scaled technologies, holistic sustainability is negatively impacted. The development of new memory technologies can help overcome the capacitor scaling limitations of DRAM. However, these technologies have their own reliability concerns, such as limited write endurance in the case of Phase Change Memories (PCM). Moreover, emerging system requirements, such as in-memory encryption to protect sensitive or private data and operation in harsh environments create additional challenges that must be addressed in the context of reliability and sustainability. This dissertation provides new multifactor and ultimately unified solutions to address many of these concerns in the same system. In particular, my contributions toward mitigating these issues are as follows. I present GreenChip and GreenAsic, which together provide the first tools to holistically evaluate new computer architecture, chip, and memory design concepts for sustainability. These tools provide detailed estimates of manufacturing and operational-phase metrics for different computing workloads and deployment scenarios. Using GreenChip, I examined existing DRAM reliability techniques in the context of their holistic sustainability impact, including my own technique to mitigate bitline crosstalk. For PCM, I provided a new reliability technique with no additional storage overhead that substantially increases the lifetime of an encrypted memory system. To provide bit-level error correction, I developed compact linked-list and Bloom-filter-based bit-level fault map structures, that provide unprecedented levels of error tabulation, combined with my own novel error correction and lifetime extension approaches based on these maps for less area than traditional ECC. In particular, FaME, can correct N faults using N bits when utilizing a bit-level fault map. For operation in harsh environments, I created a triple modular redundancy (TMR) pointer-based fault map, HOTH, which specifically protects cells shown to be weak to radiation. Finally, to combine the analyses of holistic sustainability and memory lifetime, I created the LARS technique, which adjusts the GreenChip indifference analysis to account for the additional sustainability benefit provided by increased reliability and lifetime

    Dependable Embedded Systems

    Get PDF
    This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems
    corecore