139 research outputs found

    STT-RAM์„ ์ด์šฉํ•œ ์—๋„ˆ์ง€ ํšจ์œจ์ ์ธ ์บ์‹œ ์„ค๊ณ„ ๊ธฐ์ˆ 

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2019. 2. ์ตœ๊ธฐ์˜.์ง€๋‚œ ์ˆ˜์‹ญ ๋…„๊ฐ„ '๋ฉ”๋ชจ๋ฆฌ ๋ฒฝ' ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์˜จ ์นฉ ์บ์‹œ์˜ ํฌ๊ธฐ๋Š” ๊พธ์ค€ํžˆ ์ฆ๊ฐ€ํ•ด์™”๋‹ค. ํ•˜์ง€๋งŒ ์ง€๊ธˆ๊นŒ์ง€ ์บ์‹œ์— ์ฃผ๋กœ ์‚ฌ์šฉ๋˜์–ด ์˜จ ๋ฉ”๋ชจ๋ฆฌ ๊ธฐ์ˆ ์ธ SRAM์€ ๋‚ฎ์€ ์ง‘์ ๋„์™€ ๋†’์€ ๋Œ€๊ธฐ ์ „๋ ฅ ์†Œ๋ชจ๋กœ ์ธํ•ด ํฐ ์บ์‹œ๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ๋ฐ์—๋Š” ์ ํ•ฉํ•˜์ง€ ์•Š๋‹ค. ์ด๋Ÿฌํ•œ SRAM์˜ ๋‹จ์ ์„ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด ๋” ๋†’์€ ์ง‘์ ๋„์™€ ๋‚ฎ์€ ๋Œ€๊ธฐ ์ „๋ ฅ์„ ์†Œ๋ชจํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฉ”๋ชจ๋ฆฌ ๊ธฐ์ˆ ์ธ STT-RAM์œผ๋กœ SRAM์„ ๋Œ€์ฒดํ•˜๋Š” ๊ฒƒ์ด ์ œ์•ˆ๋˜์—ˆ๋‹ค. ํ•˜์ง€๋งŒ STT-RAM์€ ๋ฐ์ดํ„ฐ๋ฅผ ์“ธ ๋•Œ ๋งŽ์€ ์—๋„ˆ์ง€์™€ ์‹œ๊ฐ„์„ ์†Œ๋น„ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋‹จ์ˆœํžˆ SRAM์„ STT-RAM์œผ๋กœ ๋Œ€์ฒดํ•˜๋Š” ๊ฒƒ์€ ์˜คํžˆ๋ ค ์บ์‹œ ์—๋„ˆ์ง€ ์†Œ๋น„๋ฅผ ์ฆ๊ฐ€์‹œํ‚จ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” STT-RAM์„ ์ด์šฉํ•œ ์—๋„ˆ์ง€ ํšจ์œจ์ ์ธ ์บ์‹œ ์„ค๊ณ„ ๊ธฐ์ˆ ๋“ค์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ, ๋ฐฐํƒ€์  ์บ์‹œ ๊ณ„์ธต ๊ตฌ์กฐ์—์„œ STT-RAM์„ ํ™œ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋ฐฐํƒ€์  ์บ์‹œ ๊ณ„์ธต ๊ตฌ์กฐ๋Š” ๊ณ„์ธต ๊ฐ„์— ์ค‘๋ณต๋œ ๋ฐ์ดํ„ฐ๊ฐ€ ์—†๊ธฐ ๋•Œ๋ฌธ์— ํฌํ•จ์  ์บ์‹œ ๊ณ„์ธต ๊ตฌ์กฐ์™€ ๋น„๊ตํ•˜์—ฌ ๋” ํฐ ์œ ํšจ ์šฉ๋Ÿ‰์„ ๊ฐ–์ง€๋งŒ, ๋ฐฐํƒ€์  ์บ์‹œ ๊ณ„์ธต ๊ตฌ์กฐ์—์„œ๋Š” ์ƒ์œ„ ๋ ˆ๋ฒจ ์บ์‹œ์—์„œ ๋‚ด๋ณด๋‚ด์ง„ ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ํ•˜์œ„ ๋ ˆ๋ฒจ ์บ์‹œ์— ์จ์•ผ ํ•˜๋ฏ€๋กœ ๋” ๋งŽ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์“ฐ๊ฒŒ ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฐํƒ€์  ์บ์‹œ ๊ณ„์ธต ๊ตฌ์กฐ์˜ ํŠน์„ฑ์€ ์“ฐ๊ธฐ ํŠน์„ฑ์ด ๋‹จ์ ์ธ STT-RAM์„ ํ•จ๊ป˜ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์„ ์–ด๋ ต๊ฒŒ ํ•œ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์žฌ์‚ฌ์šฉ ๊ฑฐ๋ฆฌ ์˜ˆ์ธก์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋Š” SRAM/STT-RAM ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์บ์‹œ ๊ตฌ์กฐ๋ฅผ ์„ค๊ณ„ํ•˜์˜€๋‹ค. ๋‘ ๋ฒˆ์งธ, ๋น„ํœ˜๋ฐœ์„ฑ STT-RAM์„ ์ด์šฉํ•ด ์บ์‹œ๋ฅผ ์„ค๊ณ„ํ•  ๋•Œ ๊ณ ๋ คํ•ด์•ผ ํ•  ์ ๋“ค์— ๋Œ€ํ•ด ๋ถ„์„ํ•˜์˜€๋‹ค. STT-RAM์˜ ๋น„ํšจ์œจ์ ์ธ ์“ฐ๊ธฐ ๋™์ž‘์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ํ•ด๊ฒฐ๋ฒ•๋“ค์ด ์ œ์•ˆ๋˜์—ˆ๋‹ค. ๊ทธ์ค‘ ํ•œ ๊ฐ€์ง€๋Š” STT-RAM ์†Œ์ž๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ ์œ ์ง€ํ•˜๋Š” ์‹œ๊ฐ„์„ ์ค„์—ฌ (ํœ˜๋ฐœ์„ฑ STT-RAM) ์“ฐ๊ธฐ ํŠน์„ฑ์„ ํ–ฅ์ƒํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. STT-RAM์— ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์žƒ๋Š” ๊ฒƒ์€ ํ™•๋ฅ ์ ์œผ๋กœ ๋ฐœ์ƒํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์•ˆ์ •์ ์œผ๋กœ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์˜ค๋ฅ˜ ์ •์ • ๋ถ€ํ˜ธ(ECC)๋ฅผ ์ด์šฉํ•ด ์ฃผ๊ธฐ์ ์œผ๋กœ ์˜ค๋ฅ˜๋ฅผ ์ •์ •ํ•ด์ฃผ์–ด์•ผ ํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” STT-RAM ๋ชจ๋ธ์„ ์ด์šฉํ•˜์—ฌ ํœ˜๋ฐœ์„ฑ STT-RAM ์„ค๊ณ„ ์š”์†Œ๋“ค์— ๋Œ€ํ•ด ๋ถ„์„ํ•˜์˜€๊ณ  ์‹คํ—˜์„ ํ†ตํ•ด ํ•ด๋‹น ์„ค๊ณ„ ์š”์†Œ๋“ค์ด ์บ์‹œ ์—๋„ˆ์ง€์™€ ์„ฑ๋Šฅ์— ์ฃผ๋Š” ์˜ํ–ฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋งค๋‹ˆ์ฝ”์–ด ์‹œ์Šคํ…œ์—์„œ์˜ ๋ถ„์‚ฐ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์บ์‹œ ๊ตฌ์กฐ๋ฅผ ์„ค๊ณ„ํ•˜์˜€๋‹ค. ๋‹จ์ˆœํžˆ ๊ธฐ์กด์˜ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์บ์‹œ์™€ ๋ถ„์‚ฐ์บ์‹œ๋ฅผ ๊ฒฐํ•ฉํ•˜๋ฉด ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์บ์‹œ์˜ ํšจ์œจ์„ฑ์— ํฐ ์˜ํ–ฅ์„ ์ฃผ๋Š” SRAM ํ™œ์šฉ๋„๊ฐ€ ๋‚ฎ์•„์ง„๋‹ค. ๋”ฐ๋ผ์„œ ๊ธฐ์กด์˜ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์บ์‹œ ๊ตฌ์กฐ์—์„œ์˜ ์—๋„ˆ์ง€ ๊ฐ์†Œ๋ฅผ ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์—†๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ถ„์‚ฐ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์บ์‹œ ๊ตฌ์กฐ์—์„œ SRAM ํ™œ์šฉ๋„๋ฅผ ๋†’์ผ ์ˆ˜ ์žˆ๋Š” ๋‘ ๊ฐ€์ง€ ์ตœ์ ํ™” ๊ธฐ์ˆ ์ธ ๋ฑ…ํฌ-๋‚ด๋ถ€ ์ตœ์ ํ™”์™€ ๋ฑ…ํฌ๊ฐ„ ์ตœ์ ํ™” ๊ธฐ์ˆ ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋ฑ…ํฌ-๋‚ด๋ถ€ ์ตœ์ ํ™”๋Š” highly-associative ์บ์‹œ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ฑ…ํฌ ๋‚ด๋ถ€์—์„œ ์“ฐ๊ธฐ ๋™์ž‘์ด ๋งŽ์€ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์‚ฐ์‹œํ‚ค๋Š” ๊ฒƒ์ด๊ณ  ๋ฑ…ํฌ๊ฐ„ ์ตœ์ ํ™”๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ์บ์‹œ ๋ฑ…ํฌ์— ์“ฐ๊ธฐ ๋™์ž‘์ด ๋งŽ์€ ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ๋ฅด๊ฒŒ ๋ถ„์‚ฐ์‹œํ‚ค๋Š” ์ตœ์ ํ™” ๋ฐฉ๋ฒ•์ด๋‹ค.Over the last decade, the capacity of on-chip cache is continuously increased to mitigate the memory wall problem. However, SRAM, which is a dominant memory technology for caches, is not suitable for such a large cache because of its low density and large static power. One way to mitigate these downsides of the SRAM cache is replacing SRAM with a more efficient memory technology. Spin-Transfer Torque RAM (STT-RAM), one of the emerging memory technology, is a promising candidate for the alternative of SRAM. As a substitute of SRAM, STT-RAM can compensate drawbacks of SRAM with its non-volatility and small cell size. However, STT-RAM has poor write characteristics such as high write energy and long write latency and thus simply replacing SRAM to STT-RAM increases cache energy. To overcome those poor write characteristics of STT-RAM, this dissertation explores three different design techniques for energy-efficient cache using STT-RAM. The first part of the dissertation focuses on combining STT-RAM with exclusive cache hierarchy. Exclusive caches are known to provide higher effective cache capacity than inclusive caches by removing duplicated copies of cache blocks across hierarchies. However, in exclusive cache hierarchies, every block evicted from the upper-level cache is written back to the last-level cache regardless of its dirtiness thereby incurring extra write overhead. This makes it challenging to use STT-RAM for exclusive last-level caches due to its high write energy and long write latency. To mitigate this problem, we design an SRAM/STT-RAM hybrid cache architecture based on reuse distance prediction. The second part of the dissertation explores trade-offs in the design of volatile STT-RAM cache. Due to the inefficient write operation of STT-RAM, various solutions have been proposed to tackle this inefficiency. One of the proposed solutions is redesigning STT-RAM cell for better write characteristics at the cost of shortened retention time (i.e., volatile STT-RAM). Since the retention failure of STT-RAM has a stochastic property, an extra overhead of periodic scrubbing with error correcting code (ECC) is required to tolerate the failure. With an analysis based on analytic STT-RAM model, we have conducted extensive experiments on various volatile STT-RAM cache design parameters including scrubbing period, ECC strength, and target failure rate. The experimental results show the impact of the parameter variations on last-level cache energy and performance and provide a guideline for designing a volatile STT-RAM with ECC and scrubbing. The last part of the dissertation proposes Benzene, an energy-efficient distributed SRAM/STT-RAM hybrid cache architecture for manycore systems running multiple applications. It is based on the observation that a naive application of hybrid cache techniques to distributed caches in a manycore architecture suffers from limited energy reduction due to uneven utilization of scarce SRAM. We propose two-level optimization techniques: intra-bank and inter-bank. Intra-bank optimization leverages highly-associative cache design, achieving more uniform distribution of writes within a bank. Inter-bank optimization evenly balances the amount of write-intensive data across the banks.Abstract i Contents iii List of Figures vii List of Tables xi Chapter 1 Introduction 1 1.1 Exclusive Last-Level Hybrid Cache 2 1.2 Designing Volatile STT-RAM Cache 4 1.3 Distributed Hybrid Cache 5 Chapter 2 Background 9 2.1 STT-RAM 9 2.1.1 Thermal Stability 10 2.1.2 Read and Write Operation of STT-RAM 11 2.1.3 Failures of STT-RAM 11 2.1.4 Volatile STT-RAM 13 2.1.5 Related Work 14 2.2 Exclusive Last-Level Hybrid Cache 18 2.2.1 Cache Hierarchies 18 2.2.2 Related Work 19 2.3 Distributed Hybrid Cache 21 2.3.1 Prediction Hybrid Cache 21 2.3.2 Distributed Cache Partitioning 22 2.3.3 Related Work 23 Chapter 3 Exclusive Last-Level Hybrid Cache 27 3.1 Motivation 27 3.1.1 Exclusive Cache Hierarchy 27 3.1.2 Reuse Distance 29 3.2 Architecture 30 3.2.1 Reuse Distance Predictor 30 3.2.2 Hybrid Cache Architecture 32 3.3 Evaluation 34 3.3.1 Methodology 34 3.3.2 LLC Energy Consumption 35 3.3.3 Main Memory Energy Consumption 38 3.3.4 Performance 39 3.3.5 Area Overhead 39 3.4 Summary 39 Chapter 4 Designing Volatile STT-RAM Cache 41 4.1 Analysis 41 4.1.1 Retention Failure of a Volatile STT-RAM Cell 41 4.1.2 Memory Array Design 43 4.2 Evaluation 45 4.2.1 Methodology 45 4.2.2 Last-Level Cache Energy 46 4.2.3 Performance 51 4.3 Summary 52 Chapter 5 Distributed Hybrid Cache 55 5.1 Motivation 55 5.2 Architecture 58 5.2.1 Intra-Bank Optimization 59 5.2.2 Inter-Bank Optimization 63 5.2.3 Other Optimizations 67 5.3 Evaluation Methodology 69 5.4 Evaluation Results 73 5.4.1 Energy Consumption and Performance 73 5.4.2 Analysis of Intra-bank Optimization 76 5.4.3 Analysis of Inter-bank Optimization 78 5.4.4 Impact of Inter-Bank Optimization on Network Energy 79 5.4.5 Sensitivity Analysis 80 5.4.6 Implementation Overhead 81 5.5 Summary 82 Chapter 6 Conculsion 85 Bibliography 88 ์ดˆ๋ก 101Docto

    Efficient Placement and Migration Policies for an STT-RAM based Hybrid L1 Cache for Intermittently Powered Systems

    Full text link
    The number of battery-powered devices is rapidly increasing due to the widespread use of IoT-enabled nodes in various fields. Energy harvesters, which help to power embedded devices, are a feasible alternative to replacing battery-powered devices. In a capacitor, the energy harvester stores enough energy to power up the embedded device and compute the task. This type of computation is referred to as intermittent computing. Energy harvesters are unable to supply continuous power to embedded devices. All registers and cache in conventional processors are volatile. We require a Non-Volatile Memory (NVM)-based Non-Volatile Processor (NVP) that can store registers and cache contents during a power failure. NVM-based caches reduce system performance and consume more energy than SRAM-based caches. This paper proposes Efficient Placement and Migration policies for hybrid cache architecture that uses SRAM and STT-RAM at the first level cache. The proposed architecture includes cache block placement and migration policies to reduce the number of writes to STT-RAM. During a power failure, the backup strategy identifies and migrates the critical blocks from SRAM to STT-RAM. When compared to the baseline architecture, the proposed architecture reduces STT-RAM writes from 63.35% to 35.93%, resulting in a 32.85% performance gain and a 23.42% reduction in energy consumption. Our backup strategy reduces backup time by 34.46% when compared to the baseline

    High-Performance Energy-Efficient and Reliable Design of Spin-Transfer Torque Magnetic Memory

    Get PDF
    In this dissertation new computing paradigms, architectures and design philosophy are proposed and evaluated for adopting the STT-MRAM technology as highly reliable, energy efficient and fast memory. For this purpose, a novel cross-layer framework from the cell-level all the way up to the system- and application-level has been developed. In these framework, the reliability issues are modeled accurately with appropriate fault models at different abstraction levels in order to analyze the overall failure rates of the entire memory and its Mean Time To Failure (MTTF) along with considering the temperature and process variation effects. Design-time, compile-time and run-time solutions have been provided to address the challenges associated with STT-MRAM. The effectiveness of the proposed solutions is demonstrated in extensive experiments that show significant improvements in comparison to state-of-the-art solutions, i.e. lower-power, higher-performance and more reliable STT-MRAM design

    A Study on Performance and Power Efficiency of Dense Non-Volatile Caches in Multi-Core Systems

    Full text link
    In this paper, we present a novel cache design based on Multi-Level Cell Spin-Transfer Torque RAM (MLC STTRAM) that can dynamically adapt the set capacity and associativity to use efficiently the full potential of MLC STTRAM. We exploit the asymmetric nature of the MLC storage scheme to build cache lines featuring heterogeneous performances, that is, half of the cache lines are read-friendly, while the other is write-friendly. Furthermore, we propose to opportunistically deactivate ways in underutilized sets to convert MLC to Single-Level Cell (SLC) mode, which features overall better performance and lifetime. Our ultimate goal is to build a cache architecture that combines the capacity advantages of MLC and performance/energy advantages of SLC. Our experiments show an improvement of 43% in total numbers of conflict misses, 27% in memory access latency, 12% in system performance, and 26% in LLC access energy, with a slight degradation in cache lifetime (about 7%) compared to an SLC cache
    • โ€ฆ
    corecore