180 research outputs found

    Enabling Fine-Grain Restricted Coset Coding Through Word-Level Compression for PCM

    Full text link
    Phase change memory (PCM) has recently emerged as a promising technology to meet the fast growing demand for large capacity memory in computer systems, replacing DRAM that is impeded by physical limitations. Multi-level cell (MLC) PCM offers high density with low per-byte fabrication cost. However, despite many advantages, such as scalability and low leakage, the energy for programming intermediate states is considerably larger than programing single-level cell PCM. In this paper, we study encoding techniques to reduce write energy for MLC PCM when the encoding granularity is lowered below the typical cache line size. We observe that encoding data blocks at small granularity to reduce write energy actually increases the write energy because of the auxiliary encoding bits. We mitigate this adverse effect by 1) designing suitable codeword mappings that use fewer auxiliary bits and 2) proposing a new Word-Level Compression (WLC) which compresses more than 91% of the memory lines and provides enough room to store the auxiliary data using a novel restricted coset encoding applied at small data block granularities. Experimental results show that the proposed encoding at 16-bit data granularity reduces the write energy by 39%, on average, versus the leading encoding approach for write energy reduction. Furthermore, it improves endurance by 20% and is more reliable than the leading approach. Hardware synthesis evaluation shows that the proposed encoding can be implemented on-chip with only a nominal area overhead.Comment: 12 page

    ์ƒ๋ณ€ํ™” ๋ฉ”๋ชจ๋ฆฌ ์‹œ์Šคํ…œ์˜ ๊ฐ„์„ญ ์˜ค๋ฅ˜ ์™„ํ™” ๋ฐ RMW ์„ฑ๋Šฅ ํ–ฅ์ƒ ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2021.8. ์ดํ˜์žฌ.Phase-change memory (PCM) announces the beginning of the new era of memory systems, owing to attractive characteristics. Many memory product manufacturers (e.g., Intel, SK Hynix, and Samsung) are developing related products. PCM can be applied to various circumstances; it is not simply limited to an extra-scale database. For example, PCM has a low standby power due to its non-volatility; hence, computation-intensive applications or mobile applications (i.e., long memory idle time) are suitable to run on PCM-based computing systems. Despite these fascinating features of PCM, PCM is still far from the general commercial market due to low reliability and long latency problems. In particular, low reliability is a painful problem for PCM in past decades. As the semiconductor process technology rapidly scales down over the years, DRAM reaches 10 nm class process technology. In addition, it is reported that the write disturbance error (WDE) would be a serious issue for PCM if it scales down below 54 nm class process technology. Therefore, addressing the problem of WDEs becomes essential to make PCM competitive to DRAM. To overcome this problem, this dissertation proposes a novel approach that can restore meta-stable cells on demand by levering two-level SRAM-based tables, thereby significantly reducing the number WDEs. Furthermore, a novel randomized approach is proposed to implement a replacement policy that originally requires hundreds of read ports on SRAM. The second problem of PCM is a long-latency compared to that of DRAM. In particular, PCM tries to enhance its throughput by adopting a larger transaction unit; however, the different unit size from the general-purpose processor cache line further degrades the system performance due to the introduction of a read-modify-write (RMW) module. Since there has never been any research related to RMW in a PCM-based memory system, this dissertation proposes a novel architecture to enhance the overall system performance and reliability of a PCM-based memory system having an RMW module. The proposed architecture enhances data re-usability without introducing extra storage resources. Furthermore, a novel operation that merges commands regardless of command types is proposed to enhance performance notably. Another problem is the absence of a full simulation platform for PCM. While the announced features of the PCM-related product (i.e., Intel Optane) are scarce due to confidential issues, all priceless information can be integrated to develop an architecture simulator that resembles the available product. To this end, this dissertation tries to scrape up all available features of modules in a PCM controller and implement a dedicated simulator for future research purposes.์ƒ๋ณ€ํ™” ๋ฉ”๋ชจ๋ฆฌ๋Š”(PCM) ๋งค๋ ฅ์ ์ธ ํŠน์„ฑ์„ ํ†ตํ•ด ๋ฉ”๋ชจ๋ฆฌ ์‹œ์Šคํ…œ์˜ ์ƒˆ๋กœ์šด ์‹œ๋Œ€์˜ ์‹œ์ž‘์„ ์•Œ๋ ธ๋‹ค. ๋งŽ์€ ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ จ ์ œํ’ˆ ์ œ์กฐ์—…์ฒด(์˜ˆ : ์ธํ…”, SK ํ•˜์ด๋‹‰์Šค, ์‚ผ์„ฑ)๊ฐ€ ๊ด€๋ จ ์ œํ’ˆ ๊ฐœ๋ฐœ์— ๋ฐ•์ฐจ๋ฅผ ๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. PCM์€ ๋‹จ์ˆœํžˆ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—๋งŒ ๊ตญํ•œ๋˜์ง€ ์•Š๊ณ  ๋‹ค์–‘ํ•œ ์ƒํ™ฉ์— ์ ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, PCM์€ ๋น„ํœ˜๋ฐœ์„ฑ์œผ๋กœ ์ธํ•ด ๋Œ€๊ธฐ ์ „๋ ฅ์ด ๋‚ฎ๋‹ค. ๋”ฐ๋ผ์„œ ๊ณ„์‚ฐ ์ง‘์•ฝ์ ์ธ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋˜๋Š” ๋ชจ๋ฐ”์ผ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์€(์ฆ‰, ๊ธด ๋ฉ”๋ชจ๋ฆฌ ์œ ํœด ์‹œ๊ฐ„) PCM ๊ธฐ๋ฐ˜ ์ปดํ“จํŒ… ์‹œ์Šคํ…œ์—์„œ ์‹คํ–‰ํ•˜๊ธฐ์— ์ ํ•ฉํ•˜๋‹ค. PCM์˜ ์ด๋Ÿฌํ•œ ๋งค๋ ฅ์ ์ธ ํŠน์„ฑ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  PCM์€ ๋‚ฎ์€ ์‹ ๋ขฐ์„ฑ๊ณผ ๊ธด ๋Œ€๊ธฐ ์‹œ๊ฐ„์œผ๋กœ ์ธํ•ด ์—ฌ์ „ํžˆ ์ผ๋ฐ˜ ์‚ฐ์—… ์‹œ์žฅ์—์„œ๋Š” DRAM๊ณผ ๋‹ค์†Œ ๊ฒฉ์ฐจ๊ฐ€ ์žˆ๋‹ค. ํŠนํžˆ ๋‚ฎ์€ ์‹ ๋ขฐ์„ฑ์€ ์ง€๋‚œ ์ˆ˜์‹ญ ๋…„ ๋™์•ˆ PCM ๊ธฐ์ˆ ์˜ ๋ฐœ์ „์„ ์ €ํ•ดํ•˜๋Š” ๋ฌธ์ œ๋‹ค. ๋ฐ˜๋„์ฒด ๊ณต์ • ๊ธฐ์ˆ ์ด ์ˆ˜๋…„์— ๊ฑธ์ณ ๋น ๋ฅด๊ฒŒ ์ถ•์†Œ๋จ์— ๋”ฐ๋ผ DRAM์€ 10nm ๊ธ‰ ๊ณต์ • ๊ธฐ์ˆ ์— ๋„๋‹ฌํ•˜์˜€๋‹ค. ์ด์–ด์„œ, ์“ฐ๊ธฐ ๋ฐฉํ•ด ์˜ค๋ฅ˜ (WDE)๊ฐ€ 54nm ๋“ฑ๊ธ‰ ํ”„๋กœ์„ธ์Šค ๊ธฐ์ˆ  ์•„๋ž˜๋กœ ์ถ•์†Œ๋˜๋ฉด PCM์— ์‹ฌ๊ฐํ•œ ๋ฌธ์ œ๊ฐ€ ๋  ๊ฒƒ์œผ๋กœ ๋ณด๊ณ ๋˜์—ˆ๋‹ค. ๋”ฐ๋ผ์„œ, WDE ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๊ฒƒ์€ PCM์ด DRAM๊ณผ ๋™๋“ฑํ•œ ๊ฒฝ์Ÿ๋ ฅ์„ ๊ฐ–์ถ”๋„๋ก ํ•˜๋Š” ๋ฐ ์žˆ์–ด ํ•„์ˆ˜์ ์ด๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด ์ด ๋…ผ๋ฌธ์—์„œ๋Š” 2-๋ ˆ๋ฒจ SRAM ๊ธฐ๋ฐ˜ ํ…Œ์ด๋ธ”์„ ํ™œ์šฉํ•˜์—ฌ WDE ์ˆ˜๋ฅผ ํฌ๊ฒŒ ์ค„์—ฌ ํ•„์š”์— ๋”ฐ๋ผ ์ค€ ์•ˆ์ • ์…€์„ ๋ณต์›ํ•  ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ์ ‘๊ทผ ๋ฐฉ์‹์„ ์ œ์•ˆํ•œ๋‹ค. ๋˜ํ•œ, ์›๋ž˜ SRAM์—์„œ ์ˆ˜๋ฐฑ ๊ฐœ์˜ ์ฝ๊ธฐ ํฌํŠธ๊ฐ€ ํ•„์š”ํ•œ ๋Œ€์ฒด ์ •์ฑ…์„ ๊ตฌํ˜„ํ•˜๊ธฐ ์œ„ํ•ด ์ƒˆ๋กœ์šด ๋žœ๋ค ๊ธฐ๋ฐ˜์˜ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. PCM์˜ ๋‘ ๋ฒˆ์งธ ๋ฌธ์ œ๋Š” DRAM์— ๋น„ํ•ด ์ง€์—ฐ ์‹œ๊ฐ„์ด ๊ธธ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ํŠนํžˆ PCM์€ ๋” ํฐ ํŠธ๋žœ์žญ์…˜ ๋‹จ์œ„๋ฅผ ์ฑ„ํƒํ•˜์—ฌ ๋‹จ์œ„์‹œ๊ฐ„ ๋‹น ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ๋Ÿ‰ ํ–ฅ์ƒ์„ ๋„๋ชจํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ฒ”์šฉ ํ”„๋กœ์„ธ์„œ ์บ์‹œ ๋ผ์ธ๊ณผ ๋‹ค๋ฅธ ์œ ๋‹› ํฌ๊ธฐ๋Š” ์ฝ๊ธฐ-์ˆ˜์ •-์“ฐ๊ธฐ (RMW) ๋ชจ๋“ˆ์˜ ๋„์ž…์œผ๋กœ ์ธํ•ด ์‹œ์Šคํ…œ ์„ฑ๋Šฅ์„ ์ €ํ•˜ํ•˜๊ฒŒ ๋œ๋‹ค. PCM ๊ธฐ๋ฐ˜ ๋ฉ”๋ชจ๋ฆฌ ์‹œ์Šคํ…œ์—์„œ RMW ๊ด€๋ จ ์—ฐ๊ตฌ๊ฐ€ ์—†์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ๋ณธ ๋…ผ๋ฌธ์€ RMW ๋ชจ๋“ˆ์„ ํƒ‘์žฌ ํ•œ PCM ๊ธฐ๋ฐ˜ ๋ฉ”๋ชจ๋ฆฌ ์‹œ์Šคํ…œ์˜ ์ „๋ฐ˜์ ์ธ ์‹œ์Šคํ…œ ์„ฑ๋Šฅ๊ณผ ์‹ ๋ขฐ์„ฑ์„ ํ–ฅ์ƒํ•˜๊ฒŒ ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆ๋œ ์•„ํ‚คํ…์ฒ˜๋Š” ์ถ”๊ฐ€ ์Šคํ† ๋ฆฌ์ง€ ๋ฆฌ์†Œ์Šค๋ฅผ ๋„์ž…ํ•˜์ง€ ์•Š๊ณ ๋„ ๋ฐ์ดํ„ฐ ์žฌ์‚ฌ์šฉ์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค. ๋˜ํ•œ, ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•ด ๋ช…๋ น ์œ ํ˜•๊ณผ ๊ด€๊ณ„์—†์ด ๋ช…๋ น์„ ๋ณ‘ํ•ฉํ•˜๋Š” ์ƒˆ๋กœ์šด ์ž‘์—…์„ ์ œ์•ˆํ•œ๋‹ค. ๋˜ ๋‹ค๋ฅธ ๋ฌธ์ œ๋Š” PCM์„ ์œ„ํ•œ ์™„์ „ํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ”Œ๋žซํผ์ด ๋ถ€์žฌํ•˜๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. PCM ๊ด€๋ จ ์ œํ’ˆ(์˜ˆ : Intel Optane)์— ๋Œ€ํ•ด ๋ฐœํ‘œ๋œ ์ •๋ณด๋Š” ๋Œ€์™ธ๋น„ ๋ฌธ์ œ๋กœ ์ธํ•ด ๋ถ€์กฑํ•˜๋‹ค. ํ•˜์ง€๋งŒ ์•Œ๋ ค์ ธ ์žˆ๋Š” ์ •๋ณด๋ฅผ ์ ์ ˆํžˆ ์ทจํ•ฉํ•˜๋ฉด ์‹œ์ค‘ ์ œํ’ˆ๊ณผ ์œ ์‚ฌํ•œ ์•„ํ‚คํ…์ฒ˜ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ฅผ ๊ฐœ๋ฐœํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์€ PCM ๋ฉ”๋ชจ๋ฆฌ ์ปจํŠธ๋กค๋Ÿฌ์— ํ•„์š”ํ•œ ๋ชจ๋“  ๋ชจ๋“ˆ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜์—ฌ ํ–ฅํ›„ ์ด์™€ ๊ด€๋ จ๋œ ์—ฐ๊ตฌ์—์„œ ์ถฉ๋ถ„ํžˆ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์ „์šฉ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ฅผ ๊ตฌํ˜„ํ•˜์˜€๋‹ค.1 INTRODUCTION 1 1.1 Limitation of Traditional Main Memory Systems 1 1.2 Phase-Change Memory as Main Memory 3 1.2.1 Opportunities of PCM-based System 3 1.2.2 Challenges of PCM-based System 4 1.3 Dissertation Overview 7 2 BACKGROUND AND PREVIOUS WORK 8 2.1 Phase-Change Memory 8 2.2 Mitigation Schemes for Write Disturbance Errors 10 2.2.1 Write Disturbance Errors 10 2.2.2 Verification and Correction 12 2.2.3 Lazy Correction 13 2.2.4 Data Encoding-based Schemes 14 2.2.5 Sparse-Insertion Write Cache 16 2.3 Performance Enhancement for Read-Modify-Write 17 2.3.1 Traditional Read-Modify-Write 17 2.3.2 Write Coalescing for RMW 19 2.4 Architecture Simulators for PCM 21 2.4.1 NVMain 21 2.4.2 Ramulator 22 2.4.3 DRAMsim3 22 3 IN-MODULE DISTURBANCE BARRIER 24 3.1 Motivation 25 3.2 IMDB: In Module-Disturbance Barrier 29 3.2.1 Architectural Overview 29 3.2.2 Implementation of Data Structures 30 3.2.3 Modification of Media Controller 36 3.3 Replacement Policy 38 3.3.1 Replacement Policy for IMDB 38 3.3.2 Approximate Lowest Number Estimator 40 3.4 Putting All Together: Case Studies 43 3.5 Evaluation 45 3.5.1 Configuration 45 3.5.2 Architectural Exploration 47 3.5.3 Effectiveness of the Replacement Policy 48 3.5.4 Sensitivity to Main Table Configuration 49 3.5.5 Sensitivity to Barrier Buffer Size 51 3.5.6 Sensitivity to AppLE Group Size 52 3.5.7 Comparison with Other Studies 54 3.6 Discussion 59 3.7 Summary 63 4 INTEGRATION OF AN RMW MODULE IN A PCM-BASED SYSTEM 64 4.1 Motivation 65 4.2 Utilization of DRAM Cache for RMW 67 4.2.1 Architectural Design 67 4.2.2 Algorithm 70 4.3 Typeless Command Merging 73 4.3.1 Architectural Design 73 4.3.2 Algorithm 74 4.4 An Alternative Implementation: SRC-RMW 78 4.4.1 Implementation of SRC-RMW 78 4.4.2 Design Constraint 80 4.5 Case Study 82 4.6 Evaluation 85 4.6.1 Configuration 85 4.6.2 Speedup 88 4.6.3 Read Reliability 91 4.6.4 Energy Consumption: Selecting a Proper Page Size 93 4.6.5 Comparison with Other Studies 95 4.7 Discussion 97 4.8 Summary 99 5 AN ALL-INCLUSIVE SIMULATOR FOR A PCM CONTROLLER 100 5.1 Motivation 101 5.2 PCMCsim: PCM Controller Simulator 103 5.2.1 Architectural Overview 103 5.2.2 Underlying Classes of PCMCsim 104 5.2.3 Implementation of Contention Behavior 108 5.2.4 Modules of PCMCsim 109 5.3 Evaluation 116 5.3.1 Correctness of the Simulator 116 5.3.2 Comparison with Other Simulators 117 5.4 Summary 119 6 Conclusion 120 Abstract (In Korean) 141 Acknowledgment 143๋ฐ•

    Architectural Techniques for Disturbance Mitigation in Future Memory Systems

    Get PDF
    With the recent advancements of CMOS technology, scaling down the feature size has improved memory capacity, power, performance and cost. However, such dramatic progress in memory technology has increasingly made the precise control of the manufacturing process below 22nm more difficult. In spite of all these virtues, the technology scaling road map predicts significant process variation from cell-to-cell. It also predicts electromagnetic disturbances among memory cells that easily deviate their circuit characterizations from design goals and pose threats to the reliability, energy efficiency and security. This dissertation proposes simple, energy-efficient and low-overhead techniques that combat the challenges resulting from technology scaling in future memory systems. Specifically, this dissertation investigates solutions tuned to particular types of disturbance challenges, such as inter-cell or intra-cell disturbance, that are energy efficient while guaranteeing memory reliability. The contribution of this dissertation will be threefold. First, it uses a deterministic counter-based approach to target the root of inter-cell disturbances in Dynamic random access memory (DRAM) and provide further benefits to overall energy consumption while deterministically mitigating inter-cell disturbances. Second, it uses Markov chains to reason about the reliability of Spin-Transfer Torque Magnetic Random-Access Memory (STT-RAM) that suffers from intra-cell disturbances and then investigates on-demand refresh policies to recover from the persistent effect of such disturbances. Third, It leverages an encoding technique integrated with a novel word level compression scheme to reduce the vulnerability of cells to inter-cell write disturbances in Phase Change Memory (PCM). However, mitigating inter-cell write disturbances and also minimizing the write energy may increase the number of updated PCM cells and result in degraded endurance. Hence, It uses multi-objective optimization to balance the write energy and endurance in PCM cells while mitigating intercell disturbances. The work in this dissertation provides important insights into how to tackle the critical reliability challenges that high-density memory systems confront in deep scaled technology nodes. It advocates for various memory technologies to guarantee reliability of future memory systems while incurring nominal costs in terms of energy, area and performance

    Re-designing Main Memory Subsystems with Emerging Monolithic 3D (M3D) Integration and Phase Change Memory Technologies

    Get PDF
    Over the past two decades, Dynamic Random-Access Memory (DRAM) has emerged as the dominant technology for implementing the main memory subsystems of all types of computing systems. However, inferring from several recent trends, computer architects in both the industry and academia have widely accepted that the density (memory capacity per chip area) and latency of DRAM based main memory subsystems cannot sufficiently scale in the future to meet the requirements of future data-centric workloads related to Artificial Intelligence (AI), Big Data, and Internet-of-Things (IoT). In fact, the achievable density and access latency in main memory subsystems presents a very fundamental trade-off. Pushing for a higher density inevitably increases access latency, and pushing for a reduced access latency often leads to a decreased density. This trade-off is so fundamental in DRAM based main memory subsystems that merely looking to re-architect DRAM subsystems cannot improve this trade-off, unless disruptive technological advancements are realized for implementing main memory subsystems. In this thesis, we focus on two key contributions to overcome the density (represented as the total chip area for the given capacity) and access latency related challenges in main memory subsystems. First, we show that the fundamental area-latency trade-offs in DRAM can be significantly improved by redesigning the DRAM cell-array structure using the emerging monolithic 3D (M3D) integration technology. A DRAM bank structure can be split across two or more M3D-integrated tiers on the same DRAM chip, to consequently be able to significantly reduce the total on-chip area occupancy of the DRAM bank and its access peripherals. This approach is fundamentally different from the well known approach of through-silicon vias (TSVs)-based 3D stacking of DRAM tiers. This is because the M3D integration based approach does not require a separate DRAM chip per tier, whereas the 3D-stacking based approach does. Our evaluation results for PARSEC benchmarks show that our designed M3D DRAM cellarray organizations can yield up to 9.56% less latency and up to 21.21% less energy-delay product (EDP), with up to 14% less DRAM die area, compared to the conventional 2D DDR4 DRAM. Second, we demonstrate a pathway for eliminating the write disturbance errors in single-level-cell PCM, thereby positioning the PCM technology, which has inherently more relaxed density and latency trade-off compared to DRAM, as a more viable option for replacing the DRAM technology. We introduce low-temperature partial-RESET operations for writing โ€˜0โ€™s in PCM cells. Compared to traditional operations that write \u270\u27s in PCM cells, partial-RESET operations do not cause disturbance errors in neighboring cells during PCM writes. The overarching theme that connects the two individual contributions into this single thesis is the density versus latency argument. The existing PCM technology has 3 to 4ร— higher write latency compared to DRAM; nevertheless, the existing PCM technology can store 2 to 4 bits in a single cell compared to one bit per cell storage capacity of DRAM. Therefore, unlike DRAM, it becomes possible to increase the density of PCM without consequently increasing PCM latency. In other words, PCM exhibits inherently improved (more relaxed) density and latency trade-off. Thus, both of our contributions in this thesis, the first contribution of re-designing DRAM with M3D integration technology and the second contribution of making the PCM technology a more viable replacement of DRAM by eliminating the write disturbance errors in PCM, connect to the common overarching goal of improving the density and latency trade-off in main memory subsystems. In addition, we also discuss in this thesis possible future research directions that are aimed at extending the impacts of our proposed ideas so that they can transform the performance of main memory subsystems of the future

    ์ƒ๋ณ€ํ™” ๋ฉ”๋ชจ๋ฆฌ ์‹œ์Šคํ…œ์˜ ๊ฐ„์„ญ ์˜ค๋ฅ˜ ์™„ํ™” ๋ฐ RMW ์„ฑ๋Šฅ ํ–ฅ์ƒ ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€,2021. 8. ์ดํ˜์žฌ.Phase-change memory (PCM) announces the beginning of the new era of memory systems, owing to attractive characteristics. Many memory product manufacturers (e.g., Intel, SK Hynix, and Samsung) are developing related products. PCM can be applied to various circumstances; it is not simply limited to an extra-scale database. For example, PCM has a low standby power due to its non-volatility; hence, computation-intensive applications or mobile applications (i.e., long memory idle time) are suitable to run on PCM-based computing systems. The second problem of PCM is a long-latency compared to that of DRAM. In particular, PCM tries to enhance its throughput by adopting a larger transaction unit; however, the different unit size from the general-purpose processor cache line further degrades the system performance due to the introduction of a read-modify-write (RMW) module. Since there has never been any research related to RMW in a PCM-based memory system, this dissertation proposes a novel architecture to enhance the overall system performance and reliability of a PCM-based memory system having an RMW module. The proposed architecture enhances data re-usability without introducing extra storage resources. Furthermore, a novel operation that merges commands regardless of command types is proposed to enhance performance notably.Despite these fascinating features of PCM, PCM is still far from the general commercial market due to low reliability and long latency problems. In particular, low reliability is a painful problem for PCM in past decades. As the semiconductor process technology rapidly scales down over the years, DRAM reaches 10 nm class process technology. In addition, it is reported that the write disturbance error (WDE) would be a serious issue for PCM if it scales down below 54 nm class process technology. Therefore, addressing the problem of WDEs becomes essential to make PCM competitive to DRAM. To overcome this problem, this dissertation proposes a novel approach that can restore meta-stable cells on demand by levering two-level SRAM-based tables, thereby significantly reducing the number WDEs. Furthermore, a novel randomized approach is proposed to implement a replacement policy that originally requires hundreds of read ports on SRAM.Another problem is the absence of a full simulation platform for PCM. While the announced features of the PCM-related product (i.e., Intel Optane) are scarce due to confidential issues, all priceless information can be integrated to develop an architecture simulator that resembles the available product. To this end, this dissertation tries to scrape up all available features of modules in a PCM controller and implement a dedicated simulator for future research purposes

    Achieving Reliable and Sustainable Next-Generation Memories

    Get PDF
    Conventional memory technology scaling has introduced reliability challenges due to dysfunctional, improperly formed cells and crosstalk from increased cell proximity. Furthermore, as the manufacturing effort becomes increasingly complex due to these deeply scaled technologies, holistic sustainability is negatively impacted. The development of new memory technologies can help overcome the capacitor scaling limitations of DRAM. However, these technologies have their own reliability concerns, such as limited write endurance in the case of Phase Change Memories (PCM). Moreover, emerging system requirements, such as in-memory encryption to protect sensitive or private data and operation in harsh environments create additional challenges that must be addressed in the context of reliability and sustainability. This dissertation provides new multifactor and ultimately unified solutions to address many of these concerns in the same system. In particular, my contributions toward mitigating these issues are as follows. I present GreenChip and GreenAsic, which together provide the first tools to holistically evaluate new computer architecture, chip, and memory design concepts for sustainability. These tools provide detailed estimates of manufacturing and operational-phase metrics for different computing workloads and deployment scenarios. Using GreenChip, I examined existing DRAM reliability techniques in the context of their holistic sustainability impact, including my own technique to mitigate bitline crosstalk. For PCM, I provided a new reliability technique with no additional storage overhead that substantially increases the lifetime of an encrypted memory system. To provide bit-level error correction, I developed compact linked-list and Bloom-filter-based bit-level fault map structures, that provide unprecedented levels of error tabulation, combined with my own novel error correction and lifetime extension approaches based on these maps for less area than traditional ECC. In particular, FaME, can correct N faults using N bits when utilizing a bit-level fault map. For operation in harsh environments, I created a triple modular redundancy (TMR) pointer-based fault map, HOTH, which specifically protects cells shown to be weak to radiation. Finally, to combine the analyses of holistic sustainability and memory lifetime, I created the LARS technique, which adjusts the GreenChip indifference analysis to account for the additional sustainability benefit provided by increased reliability and lifetime

    Design of Resistive Synaptic Devices and Array Architectures for Neuromorphic Computing

    Get PDF
    abstract: Over the past few decades, the silicon complementary-metal-oxide-semiconductor (CMOS) technology has been greatly scaled down to achieve higher performance, density and lower power consumption. As the device dimension is approaching its fundamental physical limit, there is an increasing demand for exploration of emerging devices with distinct operating principles from conventional CMOS. In recent years, many efforts have been devoted in the research of next-generation emerging non-volatile memory (eNVM) technologies, such as resistive random access memory (RRAM) and phase change memory (PCM), to replace conventional digital memories (e.g. SRAM) for implementation of synapses in large-scale neuromorphic computing systems. Essentially being compact and โ€œanalogโ€, these eNVM devices in a crossbar array can compute vector-matrix multiplication in parallel, significantly speeding up the machine/deep learning algorithms. However, non-ideal eNVM device and array properties may hamper the learning accuracy. To quantify their impact, the sparse coding algorithm was used as a starting point, where the strategies to remedy the accuracy loss were proposed, and the circuit-level design trade-offs were also analyzed. At architecture level, the parallel โ€œpseudo-crossbarโ€ array to prevent the write disturbance issue was presented. The peripheral circuits to support various parallel array architectures were also designed. One key component is the read circuit that employs the principle of integrate-and-fire neuron model to convert the analog column current to digital output. However, the read circuit is not area-efficient, which was proposed to be replaced with a compact two-terminal oscillation neuron device that exhibits metal-insulator-transition phenomenon. To facilitate the design exploration, a circuit-level macro simulator โ€œNeuroSimโ€ was developed in C++ to estimate the area, latency, energy and leakage power of various neuromorphic architectures. NeuroSim provides a wide variety of design options at the circuit/device level. NeuroSim can be used alone or as a supporting module to provide circuit-level performance estimation in neural network algorithms. A 2-layer multilayer perceptron (MLP) simulator with integration of NeuroSim was demonstrated to evaluate both the learning accuracy and circuit-level performance metrics for the online learning and offline classification, as well as to study the impact of eNVM reliability issues such as data retention and write endurance on the learning performance.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201
    • โ€ฆ
    corecore