65 research outputs found

    Re-designing Main Memory Subsystems with Emerging Monolithic 3D (M3D) Integration and Phase Change Memory Technologies

    Get PDF
    Over the past two decades, Dynamic Random-Access Memory (DRAM) has emerged as the dominant technology for implementing the main memory subsystems of all types of computing systems. However, inferring from several recent trends, computer architects in both the industry and academia have widely accepted that the density (memory capacity per chip area) and latency of DRAM based main memory subsystems cannot sufficiently scale in the future to meet the requirements of future data-centric workloads related to Artificial Intelligence (AI), Big Data, and Internet-of-Things (IoT). In fact, the achievable density and access latency in main memory subsystems presents a very fundamental trade-off. Pushing for a higher density inevitably increases access latency, and pushing for a reduced access latency often leads to a decreased density. This trade-off is so fundamental in DRAM based main memory subsystems that merely looking to re-architect DRAM subsystems cannot improve this trade-off, unless disruptive technological advancements are realized for implementing main memory subsystems. In this thesis, we focus on two key contributions to overcome the density (represented as the total chip area for the given capacity) and access latency related challenges in main memory subsystems. First, we show that the fundamental area-latency trade-offs in DRAM can be significantly improved by redesigning the DRAM cell-array structure using the emerging monolithic 3D (M3D) integration technology. A DRAM bank structure can be split across two or more M3D-integrated tiers on the same DRAM chip, to consequently be able to significantly reduce the total on-chip area occupancy of the DRAM bank and its access peripherals. This approach is fundamentally different from the well known approach of through-silicon vias (TSVs)-based 3D stacking of DRAM tiers. This is because the M3D integration based approach does not require a separate DRAM chip per tier, whereas the 3D-stacking based approach does. Our evaluation results for PARSEC benchmarks show that our designed M3D DRAM cellarray organizations can yield up to 9.56% less latency and up to 21.21% less energy-delay product (EDP), with up to 14% less DRAM die area, compared to the conventional 2D DDR4 DRAM. Second, we demonstrate a pathway for eliminating the write disturbance errors in single-level-cell PCM, thereby positioning the PCM technology, which has inherently more relaxed density and latency trade-off compared to DRAM, as a more viable option for replacing the DRAM technology. We introduce low-temperature partial-RESET operations for writing ‘0’s in PCM cells. Compared to traditional operations that write \u270\u27s in PCM cells, partial-RESET operations do not cause disturbance errors in neighboring cells during PCM writes. The overarching theme that connects the two individual contributions into this single thesis is the density versus latency argument. The existing PCM technology has 3 to 4× higher write latency compared to DRAM; nevertheless, the existing PCM technology can store 2 to 4 bits in a single cell compared to one bit per cell storage capacity of DRAM. Therefore, unlike DRAM, it becomes possible to increase the density of PCM without consequently increasing PCM latency. In other words, PCM exhibits inherently improved (more relaxed) density and latency trade-off. Thus, both of our contributions in this thesis, the first contribution of re-designing DRAM with M3D integration technology and the second contribution of making the PCM technology a more viable replacement of DRAM by eliminating the write disturbance errors in PCM, connect to the common overarching goal of improving the density and latency trade-off in main memory subsystems. In addition, we also discuss in this thesis possible future research directions that are aimed at extending the impacts of our proposed ideas so that they can transform the performance of main memory subsystems of the future

    Developing Variation Aware Simulation Tools, Models, and Designs for STT-RAM

    Get PDF
    DEVELOPING VARIATION AWARE SIMULATION TOOLS, MODELS, AND DESIGNS FOR STT-RAM Enes Eken, PhD University of Pittsburgh, 2017 In recent years, we have been witnessing the rise of spin-transfer torque random access memory (STT-RAM) technology. There are a couple of reasons which explain why STT-RAM has attracted a great deal of attention. Although conventional memory technologies like SRAM, DRAM and Flash memories are commonly used in the modern computer industry, they have major shortcomings, such as high leakage current, high power consumption and volatility. Although these drawbacks could have been overlooked in the past, they have become major concerns. Its characteristics, including low-power consumption, fast read-write access time and non-volatility make STT-RAM a promising candidate to solve the problems of other memory technologies. However, like all other memory technologies, STT-RAM has some problems such as long switching time and large programming energy of Magnetic Tunneling Junction (MTJ) which are waiting to be solved. In order to solve these long switching time and large programming energy problems, Spin-Hall Effect (SHE) assisted STT-RAM structure (SHE-RAM) has been recently invented. In this work, I propose two possible SHE-RAM designs from the aspects of two different write access operations, namely, High Density SHE-RAM and Disturbance Free SHE-RAM, respectively. In addition to the SHE-RAM designs, I will also propose a simulation tool for STT-RAMs. As an early-stage modeling tool, NVSim has been widely adopted for simulations of emerging nonvolatile memory technologies in computer architecture research, including STT-RAM, ReRAM, PCM, etc. I will introduce a new member of NVSim family – NVSim-VXs, which enables statistical simulation of STT-RAM for write performance, errors, and energy consumption

    낸드 플래시 기반 저장장치의 수명 향상을 위한 계층 교차 최적화 기법

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 2. 김지홍.Replacing HDDs with NAND flash-based storage devices (SSDs) has been one of the major challenges in modern computing systems especially in regards to better performance and higher mobility. Although uninterrupted semiconductor process scaling and multi-leveling techniques lower the price of SSDs to the comparable level of HDDs, the decreasing lifetime of NAND flash memory, as a side effect of recent advanced device technologies, is emerging as one of the major barriers to the wide adoption of SSDs in high-performance computing systems. In this dissertation, we propose new cross-layer optimization techniques to extend the lifetime (in particular, endurance) of NAND flash memory. Our techniques are motivated by our key observation that erasing a NAND block with a lower voltage or at a slower speed can significantly improve NAND endurance. However, using a lower voltage in erase operations causes adverse side effects on other NAND characteristics such as write performance and retention capability. The main goal of the proposed techniques is to improve NAND endurance without affecting the other NAND requirements. We first present Dynamic Erase Voltage and Time Scaling (DeVTS), a unified framework to enable a system software to exploit the tradeoff relationship between the endurance and erase voltages/times of NAND flash memory. DeVTS includes erase voltage/time scaling and write capability tuning, each of which brings a different impact on the endurance, performance, and retention capabilities of NAND flash memory. Second, we propose a lifetime improvement technique which takes advantage of idle times between write requests when erasing a NAND block with a slower speed or when writing data to a NAND block erased with a lower voltage. We have implemented a DeVTS-enabled FTL, called dvsFTL, which optimally adjusts the erase voltage/time and write performance of NAND devices in an automatic fashion. Our experimental results show that dvsFTL can improve NAND endurance by 62%, on average, over DeVTS-unaware FTL with a negligible decrease in the overall write performance. Third, we suggest a comprehensive lifetime improvement technique which exploits variations of the retention requirements as well as the performance requirement of SSDs when writing data to a NAND block erased with a lower voltage. We have implemented dvsFTL+, an extended version of dvsFTL, which fully utilizes DeVTS by accurately predicting the write performance and retention requirements during run times. Our experimental results show that dvsFTL+ can further improve NAND endurance by more than 50% over dvsFTL while preserving all the NAND requirements. Lastly, we present a reliability management technique which prevents retention failure problems when aggressive retention-capability tuning techniques are employed in real environments. Our measurement results show that the proposed technique can recover corrupted data from retention failures up to 23 times faster over existing data recovery techniques. Furthermore, it can successfully recover severely retention-failed data, such as ones experienced 8 times longer retention times than the retention-time specification, that were not recoverable with the existing technique. Based on the evaluation studies for the developed lifetime improvement techniques, we verified that the cross-layer optimization approach has a significant impact on extending the lifetime of NAND flash-based storage devices. We expect that our proposed techniques can positively contribute to not only the wide adoption of NAND flash memory in datacenter environments but also the gradual acceleration of using flash as main memory.Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Dissertation Goals 3 1.3 Contributions 4 1.4 Dissertation Structure 5 Chapter 2 Background 7 2.1 Threshold Voltage Window of NAND Flash Memory 7 2.2 NAND Program Operation 10 2.3 Related Work 11 2.3.1 System-Level SSD Lifetime Improvement Techniques 12 2.3.2 Device-Level Endurance-Enhancing Technique 15 2.3.3 Cross-Layer Optimization Techniques Exploiting NAND Tradeoffs 17 Chapter 3 Dynamic Erase Voltage and Time Scaling 20 3.1 Erase Voltage and Time Scaling 22 3.1.1 Motivation 22 3.1.2 Erase Voltage Scaling 23 3.1.3 Erase Time Scaling 26 3.2 Write Capability Tuning 28 3.2.1 Write Performance Tuning 29 3.2.2 Retention Capability Tuning 30 3.2.3 Disturbance Resistance Tuning 33 3.3 NAND Endurance Model 34 Chapter 4 Lifetime Improvement Technique Using Write-Performance Tuning 39 4.1 Design and Implementation of dvsFTL 40 4.1.1 Overview 40 4.1.2 Write-Speed Mode Selection 41 4.1.3 Erase Voltage Mode Selection 44 4.1.4 Erase Speed Mode Selection 46 4.1.5 DeVTS-wPT Aware FTL Modules 47 4.2 Experimental Results 50 4.2.1 Experimental Settings 50 4.2.2 Workload Characteristics 53 4.2.3 Endurance Gain Analysis 54 4.2.4 Overall Write Throughput Analysis 56 4.2.5 Detailed Analysis 58 Chapter 5 Lifetime Improvement Technique Using Retention-Capability Tuning 60 5.1 Design and Implementation of dvsFTL+ 62 5.1.1 Overview 62 5.1.2 Retention Requirement Prediction 64 5.1.3 Maximization of Endurance Benefit 66 5.1.4 Minimization of Reclaim Overhead 68 5.2 Experimental Results 69 5.2.1 Experimental Settings 69 5.2.2 Workload Characteristics 70 5.2.3 Endurance Gain Analysis 72 5.2.4 NAND Requirements Analysis 73 5.2.5 Detailed Analysis of Retention-Time Predictor 76 5.2.6 Detailed Analysis of Endurance Gain 83 Chapter 6 Reliability Management Technique for NAND Flash Memory 87 6.1 Overview 89 6.2 Motivation 91 6.2.1 Limitations of the Existing Retention-Error Management Policy 91 6.2.2 Limitations of the Existing Retention-Failure Recovery Technique 92 6.3 Retention Error Recovery Technique 95 6.3.1 Charge Movement Model 95 6.3.2 A Selective Error-Correction Procedure 99 6.3.3 Implementation 100 6.4 Experimental Results 103 Chapter 7 Conclusions 108 7.1 Summary and Conclusions 108 7.2 Future Work 110 7.2.1 Lifetime Improvement Technique Exploiting The Other NAND Tradeoffs 110 7.2.2 Development of Extended Techniques for DRAM-Flash Hybrid Main Memory Systems 111 7.2.3 Development of Specialized SSDs 112 Bibliography 114 초 록 122Docto

    Improving Phase Change Memory (PCM) and Spin-Torque-Transfer Magnetic-RAM (STT-MRAM) as Next-Generation Memories: A Circuit Perspective

    Get PDF
    In the memory hierarchy of computer systems, the traditional semiconductor memories Static RAM (SRAM) and Dynamic RAM (DRAM) have already served for several decades as cache and main memory. With technology scaling, they face increasingly intractable challenges like power, density, reliability and scalability. As a result, they become less appealing in the multi/many-core era with ever increasing size and memory-intensity of working sets. Recently, there is an increasing interest in using emerging non-volatile memory technologies in replacement of SRAM and DRAM, due to their advantages like non-volatility, high device density, near-zero cell leakage and resilience to soft errors. Among several new memory technologies, Phase Change Memory (PCM) and Spin-Torque-Transfer Magnetic-RAM (STT-MRAM) are most promising candidates in building main memory and cache, respectively. However, both of them possess unique limitations that preventing them from being effectively adopted. In this dissertation, I present my circuit design work on tackling the limitations of PCM and STT-MRAM. At bit level, both PCM and STT-MRAM suffer from excessive write energy, and PCM has very limited write endurance. For PCM, I implement Differential Write to remove large number of unnecessary bit-writes that do not alter the stored data. It is then extended to STT-MRAM as Early Write Termination, with specific optimizations to eliminate the overhead of pre-write read. At array level, PCM enjoys high density but could not provide competitive throughput due to its long write latency and limited number of read/write circuits. I propose a Pseudo-Multi-Port Bank design to exploit intra-bank parallelism by recycling and reusing shared peripheral circuits between accesses in a time-multiplexed manner. On the other hand, although STT-MRAM features satisfactory throughput, its conventional array architecture is constrained on density and scalability by the pitch of the per-column bitline pair. I propose a Common-Source-Line Array architecture which uses a shared source-line along the row, essentially leaving only one bitline per column. For these techniques, I provide circuit level analyses as well as architecture/system level and/or process/device level discussions. In addition, relevant background and work are thoroughly surveyed and potential future research topics are discussed, offering insights and prospects of these next-generation memories

    Architectural Techniques to Enable Reliable and Scalable Memory Systems

    Get PDF
    High capacity and scalable memory systems play a vital role in enabling our desktops, smartphones, and pervasive technologies like Internet of Things (IoT). Unfortunately, memory systems are becoming increasingly prone to faults. This is because we rely on technology scaling to improve memory density, and at small feature sizes, memory cells tend to break easily. Today, memory reliability is seen as the key impediment towards using high-density devices, adopting new technologies, and even building the next Exascale supercomputer. To ensure even a bare-minimum level of reliability, present-day solutions tend to have high performance, power and area overheads. Ideally, we would like memory systems to remain robust, scalable, and implementable while keeping the overheads to a minimum. This dissertation describes how simple cross-layer architectural techniques can provide orders of magnitude higher reliability and enable seamless scalability for memory systems while incurring negligible overheads.Comment: PhD thesis, Georgia Institute of Technology (May 2017

    ARCHITECTING EMERGING MEMORY TECHNOLOGIES FOR ENERGY-EFFICIENT COMPUTING IN MODERN PROCESSORS

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore