10,847 research outputs found

    Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models

    Get PDF
    The upcoming many-core architectures require software developers to exploit concurrency to utilize available computational power. Today's high-level language virtual machines (VMs), which are a cornerstone of software development, do not provide sufficient abstraction for concurrency concepts. We analyze concrete and abstract concurrency models and identify the challenges they impose for VMs. To provide sufficient concurrency support in VMs, we propose to integrate concurrency operations into VM instruction sets. Since there will always be VMs optimized for special purposes, our goal is to develop a methodology to design instruction sets with concurrency support. Therefore, we also propose a list of trade-offs that have to be investigated to advise the design of such instruction sets. As a first experiment, we implemented one instruction set extension for shared memory and one for non-shared memory concurrency. From our experimental results, we derived a list of requirements for a full-grown experimental environment for further research

    Cache Hierarchy Inspired Compression: a Novel Architecture for Data Streams

    Get PDF
    We present an architecture for data streams based on structures typically found in web cache hierarchies. The main idea is to build a meta level analyser from a number of levels constructed over time from a data stream. We present the general architecture for such a system and an application to classification. This architecture is an instance of the general wrapper idea allowing us to reuse standard batch learning algorithms in an inherently incremental learning environment. By artificially generating data sources we demonstrate that a hierarchy containing a mixture of models is able to adapt over time to the source of the data. In these experiments the hierarchies use an elementary performance based replacement policy and unweighted voting for making classification decisions

    STT-RAM์„ ์ด์šฉํ•œ ์—๋„ˆ์ง€ ํšจ์œจ์ ์ธ ์บ์‹œ ์„ค๊ณ„ ๊ธฐ์ˆ 

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2019. 2. ์ตœ๊ธฐ์˜.์ง€๋‚œ ์ˆ˜์‹ญ ๋…„๊ฐ„ '๋ฉ”๋ชจ๋ฆฌ ๋ฒฝ' ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์˜จ ์นฉ ์บ์‹œ์˜ ํฌ๊ธฐ๋Š” ๊พธ์ค€ํžˆ ์ฆ๊ฐ€ํ•ด์™”๋‹ค. ํ•˜์ง€๋งŒ ์ง€๊ธˆ๊นŒ์ง€ ์บ์‹œ์— ์ฃผ๋กœ ์‚ฌ์šฉ๋˜์–ด ์˜จ ๋ฉ”๋ชจ๋ฆฌ ๊ธฐ์ˆ ์ธ SRAM์€ ๋‚ฎ์€ ์ง‘์ ๋„์™€ ๋†’์€ ๋Œ€๊ธฐ ์ „๋ ฅ ์†Œ๋ชจ๋กœ ์ธํ•ด ํฐ ์บ์‹œ๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ๋ฐ์—๋Š” ์ ํ•ฉํ•˜์ง€ ์•Š๋‹ค. ์ด๋Ÿฌํ•œ SRAM์˜ ๋‹จ์ ์„ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด ๋” ๋†’์€ ์ง‘์ ๋„์™€ ๋‚ฎ์€ ๋Œ€๊ธฐ ์ „๋ ฅ์„ ์†Œ๋ชจํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฉ”๋ชจ๋ฆฌ ๊ธฐ์ˆ ์ธ STT-RAM์œผ๋กœ SRAM์„ ๋Œ€์ฒดํ•˜๋Š” ๊ฒƒ์ด ์ œ์•ˆ๋˜์—ˆ๋‹ค. ํ•˜์ง€๋งŒ STT-RAM์€ ๋ฐ์ดํ„ฐ๋ฅผ ์“ธ ๋•Œ ๋งŽ์€ ์—๋„ˆ์ง€์™€ ์‹œ๊ฐ„์„ ์†Œ๋น„ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋‹จ์ˆœํžˆ SRAM์„ STT-RAM์œผ๋กœ ๋Œ€์ฒดํ•˜๋Š” ๊ฒƒ์€ ์˜คํžˆ๋ ค ์บ์‹œ ์—๋„ˆ์ง€ ์†Œ๋น„๋ฅผ ์ฆ๊ฐ€์‹œํ‚จ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” STT-RAM์„ ์ด์šฉํ•œ ์—๋„ˆ์ง€ ํšจ์œจ์ ์ธ ์บ์‹œ ์„ค๊ณ„ ๊ธฐ์ˆ ๋“ค์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ, ๋ฐฐํƒ€์  ์บ์‹œ ๊ณ„์ธต ๊ตฌ์กฐ์—์„œ STT-RAM์„ ํ™œ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋ฐฐํƒ€์  ์บ์‹œ ๊ณ„์ธต ๊ตฌ์กฐ๋Š” ๊ณ„์ธต ๊ฐ„์— ์ค‘๋ณต๋œ ๋ฐ์ดํ„ฐ๊ฐ€ ์—†๊ธฐ ๋•Œ๋ฌธ์— ํฌํ•จ์  ์บ์‹œ ๊ณ„์ธต ๊ตฌ์กฐ์™€ ๋น„๊ตํ•˜์—ฌ ๋” ํฐ ์œ ํšจ ์šฉ๋Ÿ‰์„ ๊ฐ–์ง€๋งŒ, ๋ฐฐํƒ€์  ์บ์‹œ ๊ณ„์ธต ๊ตฌ์กฐ์—์„œ๋Š” ์ƒ์œ„ ๋ ˆ๋ฒจ ์บ์‹œ์—์„œ ๋‚ด๋ณด๋‚ด์ง„ ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ํ•˜์œ„ ๋ ˆ๋ฒจ ์บ์‹œ์— ์จ์•ผ ํ•˜๋ฏ€๋กœ ๋” ๋งŽ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์“ฐ๊ฒŒ ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฐํƒ€์  ์บ์‹œ ๊ณ„์ธต ๊ตฌ์กฐ์˜ ํŠน์„ฑ์€ ์“ฐ๊ธฐ ํŠน์„ฑ์ด ๋‹จ์ ์ธ STT-RAM์„ ํ•จ๊ป˜ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์„ ์–ด๋ ต๊ฒŒ ํ•œ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์žฌ์‚ฌ์šฉ ๊ฑฐ๋ฆฌ ์˜ˆ์ธก์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋Š” SRAM/STT-RAM ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์บ์‹œ ๊ตฌ์กฐ๋ฅผ ์„ค๊ณ„ํ•˜์˜€๋‹ค. ๋‘ ๋ฒˆ์งธ, ๋น„ํœ˜๋ฐœ์„ฑ STT-RAM์„ ์ด์šฉํ•ด ์บ์‹œ๋ฅผ ์„ค๊ณ„ํ•  ๋•Œ ๊ณ ๋ คํ•ด์•ผ ํ•  ์ ๋“ค์— ๋Œ€ํ•ด ๋ถ„์„ํ•˜์˜€๋‹ค. STT-RAM์˜ ๋น„ํšจ์œจ์ ์ธ ์“ฐ๊ธฐ ๋™์ž‘์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ํ•ด๊ฒฐ๋ฒ•๋“ค์ด ์ œ์•ˆ๋˜์—ˆ๋‹ค. ๊ทธ์ค‘ ํ•œ ๊ฐ€์ง€๋Š” STT-RAM ์†Œ์ž๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ ์œ ์ง€ํ•˜๋Š” ์‹œ๊ฐ„์„ ์ค„์—ฌ (ํœ˜๋ฐœ์„ฑ STT-RAM) ์“ฐ๊ธฐ ํŠน์„ฑ์„ ํ–ฅ์ƒํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. STT-RAM์— ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์žƒ๋Š” ๊ฒƒ์€ ํ™•๋ฅ ์ ์œผ๋กœ ๋ฐœ์ƒํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์•ˆ์ •์ ์œผ๋กœ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์˜ค๋ฅ˜ ์ •์ • ๋ถ€ํ˜ธ(ECC)๋ฅผ ์ด์šฉํ•ด ์ฃผ๊ธฐ์ ์œผ๋กœ ์˜ค๋ฅ˜๋ฅผ ์ •์ •ํ•ด์ฃผ์–ด์•ผ ํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” STT-RAM ๋ชจ๋ธ์„ ์ด์šฉํ•˜์—ฌ ํœ˜๋ฐœ์„ฑ STT-RAM ์„ค๊ณ„ ์š”์†Œ๋“ค์— ๋Œ€ํ•ด ๋ถ„์„ํ•˜์˜€๊ณ  ์‹คํ—˜์„ ํ†ตํ•ด ํ•ด๋‹น ์„ค๊ณ„ ์š”์†Œ๋“ค์ด ์บ์‹œ ์—๋„ˆ์ง€์™€ ์„ฑ๋Šฅ์— ์ฃผ๋Š” ์˜ํ–ฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋งค๋‹ˆ์ฝ”์–ด ์‹œ์Šคํ…œ์—์„œ์˜ ๋ถ„์‚ฐ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์บ์‹œ ๊ตฌ์กฐ๋ฅผ ์„ค๊ณ„ํ•˜์˜€๋‹ค. ๋‹จ์ˆœํžˆ ๊ธฐ์กด์˜ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์บ์‹œ์™€ ๋ถ„์‚ฐ์บ์‹œ๋ฅผ ๊ฒฐํ•ฉํ•˜๋ฉด ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์บ์‹œ์˜ ํšจ์œจ์„ฑ์— ํฐ ์˜ํ–ฅ์„ ์ฃผ๋Š” SRAM ํ™œ์šฉ๋„๊ฐ€ ๋‚ฎ์•„์ง„๋‹ค. ๋”ฐ๋ผ์„œ ๊ธฐ์กด์˜ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์บ์‹œ ๊ตฌ์กฐ์—์„œ์˜ ์—๋„ˆ์ง€ ๊ฐ์†Œ๋ฅผ ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์—†๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ถ„์‚ฐ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์บ์‹œ ๊ตฌ์กฐ์—์„œ SRAM ํ™œ์šฉ๋„๋ฅผ ๋†’์ผ ์ˆ˜ ์žˆ๋Š” ๋‘ ๊ฐ€์ง€ ์ตœ์ ํ™” ๊ธฐ์ˆ ์ธ ๋ฑ…ํฌ-๋‚ด๋ถ€ ์ตœ์ ํ™”์™€ ๋ฑ…ํฌ๊ฐ„ ์ตœ์ ํ™” ๊ธฐ์ˆ ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋ฑ…ํฌ-๋‚ด๋ถ€ ์ตœ์ ํ™”๋Š” highly-associative ์บ์‹œ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ฑ…ํฌ ๋‚ด๋ถ€์—์„œ ์“ฐ๊ธฐ ๋™์ž‘์ด ๋งŽ์€ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์‚ฐ์‹œํ‚ค๋Š” ๊ฒƒ์ด๊ณ  ๋ฑ…ํฌ๊ฐ„ ์ตœ์ ํ™”๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ์บ์‹œ ๋ฑ…ํฌ์— ์“ฐ๊ธฐ ๋™์ž‘์ด ๋งŽ์€ ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ๋ฅด๊ฒŒ ๋ถ„์‚ฐ์‹œํ‚ค๋Š” ์ตœ์ ํ™” ๋ฐฉ๋ฒ•์ด๋‹ค.Over the last decade, the capacity of on-chip cache is continuously increased to mitigate the memory wall problem. However, SRAM, which is a dominant memory technology for caches, is not suitable for such a large cache because of its low density and large static power. One way to mitigate these downsides of the SRAM cache is replacing SRAM with a more efficient memory technology. Spin-Transfer Torque RAM (STT-RAM), one of the emerging memory technology, is a promising candidate for the alternative of SRAM. As a substitute of SRAM, STT-RAM can compensate drawbacks of SRAM with its non-volatility and small cell size. However, STT-RAM has poor write characteristics such as high write energy and long write latency and thus simply replacing SRAM to STT-RAM increases cache energy. To overcome those poor write characteristics of STT-RAM, this dissertation explores three different design techniques for energy-efficient cache using STT-RAM. The first part of the dissertation focuses on combining STT-RAM with exclusive cache hierarchy. Exclusive caches are known to provide higher effective cache capacity than inclusive caches by removing duplicated copies of cache blocks across hierarchies. However, in exclusive cache hierarchies, every block evicted from the upper-level cache is written back to the last-level cache regardless of its dirtiness thereby incurring extra write overhead. This makes it challenging to use STT-RAM for exclusive last-level caches due to its high write energy and long write latency. To mitigate this problem, we design an SRAM/STT-RAM hybrid cache architecture based on reuse distance prediction. The second part of the dissertation explores trade-offs in the design of volatile STT-RAM cache. Due to the inefficient write operation of STT-RAM, various solutions have been proposed to tackle this inefficiency. One of the proposed solutions is redesigning STT-RAM cell for better write characteristics at the cost of shortened retention time (i.e., volatile STT-RAM). Since the retention failure of STT-RAM has a stochastic property, an extra overhead of periodic scrubbing with error correcting code (ECC) is required to tolerate the failure. With an analysis based on analytic STT-RAM model, we have conducted extensive experiments on various volatile STT-RAM cache design parameters including scrubbing period, ECC strength, and target failure rate. The experimental results show the impact of the parameter variations on last-level cache energy and performance and provide a guideline for designing a volatile STT-RAM with ECC and scrubbing. The last part of the dissertation proposes Benzene, an energy-efficient distributed SRAM/STT-RAM hybrid cache architecture for manycore systems running multiple applications. It is based on the observation that a naive application of hybrid cache techniques to distributed caches in a manycore architecture suffers from limited energy reduction due to uneven utilization of scarce SRAM. We propose two-level optimization techniques: intra-bank and inter-bank. Intra-bank optimization leverages highly-associative cache design, achieving more uniform distribution of writes within a bank. Inter-bank optimization evenly balances the amount of write-intensive data across the banks.Abstract i Contents iii List of Figures vii List of Tables xi Chapter 1 Introduction 1 1.1 Exclusive Last-Level Hybrid Cache 2 1.2 Designing Volatile STT-RAM Cache 4 1.3 Distributed Hybrid Cache 5 Chapter 2 Background 9 2.1 STT-RAM 9 2.1.1 Thermal Stability 10 2.1.2 Read and Write Operation of STT-RAM 11 2.1.3 Failures of STT-RAM 11 2.1.4 Volatile STT-RAM 13 2.1.5 Related Work 14 2.2 Exclusive Last-Level Hybrid Cache 18 2.2.1 Cache Hierarchies 18 2.2.2 Related Work 19 2.3 Distributed Hybrid Cache 21 2.3.1 Prediction Hybrid Cache 21 2.3.2 Distributed Cache Partitioning 22 2.3.3 Related Work 23 Chapter 3 Exclusive Last-Level Hybrid Cache 27 3.1 Motivation 27 3.1.1 Exclusive Cache Hierarchy 27 3.1.2 Reuse Distance 29 3.2 Architecture 30 3.2.1 Reuse Distance Predictor 30 3.2.2 Hybrid Cache Architecture 32 3.3 Evaluation 34 3.3.1 Methodology 34 3.3.2 LLC Energy Consumption 35 3.3.3 Main Memory Energy Consumption 38 3.3.4 Performance 39 3.3.5 Area Overhead 39 3.4 Summary 39 Chapter 4 Designing Volatile STT-RAM Cache 41 4.1 Analysis 41 4.1.1 Retention Failure of a Volatile STT-RAM Cell 41 4.1.2 Memory Array Design 43 4.2 Evaluation 45 4.2.1 Methodology 45 4.2.2 Last-Level Cache Energy 46 4.2.3 Performance 51 4.3 Summary 52 Chapter 5 Distributed Hybrid Cache 55 5.1 Motivation 55 5.2 Architecture 58 5.2.1 Intra-Bank Optimization 59 5.2.2 Inter-Bank Optimization 63 5.2.3 Other Optimizations 67 5.3 Evaluation Methodology 69 5.4 Evaluation Results 73 5.4.1 Energy Consumption and Performance 73 5.4.2 Analysis of Intra-bank Optimization 76 5.4.3 Analysis of Inter-bank Optimization 78 5.4.4 Impact of Inter-Bank Optimization on Network Energy 79 5.4.5 Sensitivity Analysis 80 5.4.6 Implementation Overhead 81 5.5 Summary 82 Chapter 6 Conculsion 85 Bibliography 88 ์ดˆ๋ก 101Docto

    Energy Saving Techniques for Phase Change Memory (PCM)

    Full text link
    In recent years, the energy consumption of computing systems has increased and a large fraction of this energy is consumed in main memory. Towards this, researchers have proposed use of non-volatile memory, such as phase change memory (PCM), which has low read latency and power; and nearly zero leakage power. However, the write latency and power of PCM are very high and this, along with limited write endurance of PCM present significant challenges in enabling wide-spread adoption of PCM. To address this, several architecture-level techniques have been proposed. In this report, we review several techniques to manage power consumption of PCM. We also classify these techniques based on their characteristics to provide insights into them. The aim of this work is encourage researchers to propose even better techniques for improving energy efficiency of PCM based main memory.Comment: Survey, phase change RAM (PCRAM
    • โ€ฆ
    corecore