23 research outputs found

    Improving Performance and Endurance for Crossbar Resistive Memory

    Get PDF
    Resistive Memory (ReRAM) has emerged as a promising non-volatile memory technology that may replace a significant portion of DRAM in future computer systems. When adopting crossbar architecture, ReRAM cell can achieve the smallest theoretical size in fabrication, ideally for constructing dense memory with large capacity. However, crossbar cell structure suffers from severe performance and endurance degradations, which come from large voltage drops on long wires. In this dissertation, I first study the correlation between the ReRAM cell switching latency and the number of cells in low resistant state (LRS) along bitlines, and propose to dynamically speed up write operations based on bitline data patterns. By leveraging the intrinsic in-memory processing capability of ReRAM crossbars, a low overhead runtime profiler that effectively tracks the data patterns in different bitlines is proposed. To achieve further write latency reduction, data compression and row address dependent memory data layout are employed to reduce the numbers of LRS cells on bitlines. Moreover, two optimization techniques are presented to mitigate energy overhead brought by bitline data patterns tracking. Second, I propose XWL, a novel table-based wear leveling scheme for ReRAM crossbars and study the correlation between write endurance and voltage stress in ReRAM crossbars. By estimating and tracking the effective write stress to different rows at runtime, XWL chooses the ones that are stressed the most to mitigate. Additionally, two extended scenarios are further examined for the performance and endurance issues in neural network accelerators as well as 3D vertical ReRAM (3D-VRAM) arrays. For the ReRAM crossbar-based accelerators, by exploiting the wearing out mechanism of ReRAM cell, a novel comprehensive framework, ReNEW, is proposed to enhance the lifetime of the ReRAM crossbar-based accelerators, particularly for neural network training. To reduce the write latency in 3D-VRAM arrays, a collection of techniques, including an in-memory data encoding scheme, a data pattern estimator for assessing cell resistance distributions, and a write time reduction scheme that opportunistically reduces RESET latency with runtime data patterns, are devised

    Architectural Techniques for Multi-Level Cell Phase Change Memory Based Main Memory

    Get PDF
    Phase change memory (PCM) recently has emerged as a promising technology to meet the fast growing demand for large capacity main memory in modern computing systems. Multi-level cell (MLC) PCM storing multiple bits in a single cell offers high density with low per-byte fabrication cost. However, PCM suffers from long write latency, short cell endurance, limited write throughput and high peak power, which makes it challenging to be integrated in the memory hierarchy. To address the long write latency, I propose write truncation to reduce the number of write iterations with the assistance of an extra error correction code (ECC). I also propose form switch (FS) to reduce the storage overhead of the ECC. By storing highly compressible lines in single level cell (SLC) form, FS improves read latency as well. To attack the short cell endurance and large peak power, I propose elastic RESET (ER) to construct triple-level cell PCM. By reducing RESET energy, ER significantly reduces peak power and prolongs PCM lifetime. To improve the write concurrency, I propose fine-grained write power budgeting (FPB) observing a global power budget and regulates power across write iterations according to the step-down power demand of each iteration. A global charge pump is also integrated onto a DIMM to boost power for hot PCM chips while staying within the global power budget. To further reduce the peak power, I propose intra-write RESET scheduling distributing cell RESET initializations in the whole write operation duration, so that the on-chip charge pump size can also be reduced

    IMPROVING THE PERFORMANCE OF HYBRID MAIN MEMORY THROUGH SYSTEM AWARE MANAGEMENT OF HETEROGENEOUS RESOURCES

    Get PDF
    Modern computer systems feature memory hierarchies which typically include DRAM as the main memory and HDD as the secondary storage. DRAM and HDD have been extensively used for the past several decades because of their high performance and low cost per bit at their level of hierarchy. Unfortunately, DRAM is facing serious scaling and power consumption problems, while HDD has suffered from stagnant performance improvement and poor energy efficiency. After all, computer system architects have an implicit consensus that there is no hope to improve future system’s performance and power consumption unless something fundamentally changes. To address the looming problems with DRAM and HDD, emerging Non-Volatile RAMs (NVRAMs) such as Phase Change Memory (PCM) or Spin-Transfer-Toque Magnetoresistive RAM (STT-MRAM) have been actively explored as new media of future memory hierarchy. However, since these NVRAMs have quite different characteristics from DRAM and HDD, integrating NVRAMs into conventional memory hierarchy requires significant architectural re-considerations and changes, imposing additional and complicated design trade-offs on the memory hierarchy design. This work assumes a future system in which both main memory and secondary storage include NVRAMs and are placed on the same memory bus. In this system organization, this dissertation work has addressed a problem facing the efficient exploitation of NVRAMs and DRAM integrated into a future platform’s memory hierarchy. Especially, this dissertation has investigated the system performance and lifetime improvement endowed by a novel system architecture called Memorage which co-manages all available physical NVRAM resources for main memory and storage at a system-level. Also, the work has studied the impact of a model-guided, hardware-driven page swap in a hybrid main memory on the application performance. Together, the two ideas enable a future system to ameliorate high system performance degradation under heavy memory pressure and to avoid an inefficient use of DRAM capacity due to injudicious page swap decisions. In summary, this research has not only demonstrated how emerging NVRAMs can be effectively employed and integrated in order to enhance the performance and endurance of a future system, but also helped system architects understand important design trade-offs for emerging NVRAMs based memory and storage systems

    Towards Successful Application of Phase Change Memories: Addressing Challenges from Write Operations

    Get PDF
    The emerging Phase Change Memory (PCM) technology is drawing increasing attention due to its advantages in non-volatility, byte-addressability and scalability. It is regarded as a promising candidate for future main memory. However, PCM's write operation has some limitations that pose challenges to its application in memory. The disadvantages include long write latency, high write power and limited write endurance. In this thesis, I present my effort towards successful application of PCM memory. My research consists of several optimizing techniques at both the circuit and architecture level. First, at the circuit level, I propose Differential Write to remove unnecessary bit changes in PCM writes. This is not only beneficial to endurance but also to the energy and latency of writes. Second, I propose two memory scheduling enhancements (AWP and RAWP) for a non-blocking bank design. My memory scheduling enhancements can exploit intra-bank parallelism provided by non-blocking bank design, and achieve significant throughput improvement. Third, I propose Bit Level Power Budgeting (BPB), a fine-grained power budgeting technique that leverages the information from Differential Write to achieve even higher memory throughput under the same power budget. Fourth, I propose techniques to improve the QoS tuning ability of high-priority applications when running on PCM memory. In summary, the techniques I propose effectively address the challenges of PCM's write operations. In addition, I present the experimental infrastructure in this work and my visions of potential future research topics, which could be helpful to other researchers in the area

    Low Energy Solutions for Multi- and Triple-Level Cell Non-Volatile Memories

    Get PDF
    Due to the high refresh power and scalability issues of DRAM, non-volatile memories (NVM) such as phase change memory (PCM) and resistive RAM (RRAM) are being actively investigated as viable replacements of DRAM. However, although these NVMs are more scalable than DRAM, they have shortcomings such as higher write energy and lower endurance. Further, the increased capacity of multi- and triple-level cells (MLC/TLC) in these NVM technologies comes at the cost of even higher write energies and lower endurance attributed to the MLC/TLC program-and-verify (P&V) techniques. This dissertation makes the following contributions to address the high write energy associated with MLC/TLC NVMs. First, we describe MFNW, a Flip-N-Write encoding that effectively reduces the write energy and improves the endurance of MLC NVMs. MFNW encodes an MLC/TLC word into a number of codewords and selects the one resulting in lowest write energy. Second, we present another encoding solution that is based on perfect knowledge frequent value encoding (FVE). This encoding technique leverages machine learning to cluster a set of general-purpose applications according to their frequency profiles and generates a dedicated offline FVE for every cluster to maximize energy reduction across a broad spectrum of applications. Whereas the proposed encodings are used as an add-on layer on top of the MLC/TLC P&V solutions, the third contribution is a low latency, low energy P&V (L3EP) approach for MLC/TLC PCM. The primary motivation of L^3EP is to fix the problem from its origin by crafting a higher speed programming algorithm. A reduction in write latency implies a reduction in write energy as well as an improvement in cell endurance. Directions for future research include the integration and evaluation of a software-based hybrid encoding mechanism for MLC/TLC NVMs; this is a page-level encoding that employs a DRAM cache for coding/decoding purposes. The main challenges include how the cache block replacement algorithm can easily access the page-level auxiliary cells to encode the cache block correctly. In summary, this work presents multiple solutions to address major challenges of MLC/TLC NVMs, including write latency, write energy, and cell endurance

    A (Nearly) Free Lunch:Extending NAND Flash Lifetime by Exploiting Neglected Physical Properties

    Get PDF
    NAND flash is a key storage technology in modern computing systems. Without it, many devices would probably not exist today or would at least not benefit from as many features. The very large success of this technology motivated massive efforts to scale it down in order to increase its density further. However, NAND flash is currently facing physical limitations that prevent it reaching smaller cell sizes without severely reducing its storage reliability and lifetime. Accordingly, in the present thesis we aim at relieving some constraints from device manufacturing by addressing flash irregularities at a higher level. For example, we acknowledge the fact that process variation plus other factors render some regions of a flash device more sensitive than others. This difference usually leads to sensitive regions exhausting their lifetime early, which then causes the device to become unusable, while the rest of the device is still healthy, yet not exploitable. Consequently, we propose to postpone this exhaustion point with new strategies that require minimal resources to be implemented and effectively extend flash devices lifetime. Sometimes, our strategies involve unconventional methods to access the flash that are not supported by specification document and, therefore, should not be used lightly. Hence, we also present thorough characterization experiments on actual NAND flash chips to validate these methods and model their effect on a flash device. Finally, we evaluate the performance of our methods by implementing a trace-driven flash device simulator and execute a large set of realistic disk traces. Overall, we exploit properties that are either neglected or not understood to propose methods that are nearly free to implement and systematically extend NAND flash lifetime. We are convinced that future NAND flash architectures will regularly bring radical physical changes, which will inevitably come together with a new set of physical properties to investigate and to exploit

    Improving Phase Change Memory (PCM) and Spin-Torque-Transfer Magnetic-RAM (STT-MRAM) as Next-Generation Memories: A Circuit Perspective

    Get PDF
    In the memory hierarchy of computer systems, the traditional semiconductor memories Static RAM (SRAM) and Dynamic RAM (DRAM) have already served for several decades as cache and main memory. With technology scaling, they face increasingly intractable challenges like power, density, reliability and scalability. As a result, they become less appealing in the multi/many-core era with ever increasing size and memory-intensity of working sets. Recently, there is an increasing interest in using emerging non-volatile memory technologies in replacement of SRAM and DRAM, due to their advantages like non-volatility, high device density, near-zero cell leakage and resilience to soft errors. Among several new memory technologies, Phase Change Memory (PCM) and Spin-Torque-Transfer Magnetic-RAM (STT-MRAM) are most promising candidates in building main memory and cache, respectively. However, both of them possess unique limitations that preventing them from being effectively adopted. In this dissertation, I present my circuit design work on tackling the limitations of PCM and STT-MRAM. At bit level, both PCM and STT-MRAM suffer from excessive write energy, and PCM has very limited write endurance. For PCM, I implement Differential Write to remove large number of unnecessary bit-writes that do not alter the stored data. It is then extended to STT-MRAM as Early Write Termination, with specific optimizations to eliminate the overhead of pre-write read. At array level, PCM enjoys high density but could not provide competitive throughput due to its long write latency and limited number of read/write circuits. I propose a Pseudo-Multi-Port Bank design to exploit intra-bank parallelism by recycling and reusing shared peripheral circuits between accesses in a time-multiplexed manner. On the other hand, although STT-MRAM features satisfactory throughput, its conventional array architecture is constrained on density and scalability by the pitch of the per-column bitline pair. I propose a Common-Source-Line Array architecture which uses a shared source-line along the row, essentially leaving only one bitline per column. For these techniques, I provide circuit level analyses as well as architecture/system level and/or process/device level discussions. In addition, relevant background and work are thoroughly surveyed and potential future research topics are discussed, offering insights and prospects of these next-generation memories

    Letter from the Special Issue Editor

    Get PDF
    Editorial work for DEBULL on a special issue on data management on Storage Class Memory (SCM) technologies
    corecore