Search CORE

193 research outputs found

A Study on Performance and Power Efficiency of Dense Non-Volatile Caches in Multi-Core Systems

Author: Arjomand Mohammad
Das Chita R.
Jadidi Amin
Kandemir Mahmut T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/04/2017
Field of study

In this paper, we present a novel cache design based on Multi-Level Cell Spin-Transfer Torque RAM (MLC STTRAM) that can dynamically adapt the set capacity and associativity to use efficiently the full potential of MLC STTRAM. We exploit the asymmetric nature of the MLC storage scheme to build cache lines featuring heterogeneous performances, that is, half of the cache lines are read-friendly, while the other is write-friendly. Furthermore, we propose to opportunistically deactivate ways in underutilized sets to convert MLC to Single-Level Cell (SLC) mode, which features overall better performance and lifetime. Our ultimate goal is to build a cache architecture that combines the capacity advantages of MLC and performance/energy advantages of SLC. Our experiments show an improvement of 43% in total numbers of conflict misses, 27% in memory access latency, 12% in system performance, and 26% in LLC access energy, with a slight degradation in cache lifetime (about 7%) compared to an SLC cache

arXiv.org e-Print Archive

Crossref

Energy-Aware Data Movement In Non-Volatile Memory Hierarchies

Author: Najafabadi Navid Khoshavi
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2017
Field of study

While technology scaling enables increased density for memory cells, the intrinsic high leakage power of conventional CMOS technology and the demand for reduced energy consumption inspires the use of emerging technology alternatives such as eDRAM and Non-Volatile Memory (NVM) including STT-MRAM, PCM, and RRAM. The utilization of emerging technology in Last Level Cache (LLC) designs which occupies a signifcant fraction of total die area in Chip Multi Processors (CMPs) introduces new dimensions of vulnerability, energy consumption, and performance delivery. To be specific, a part of this research focuses on eDRAM Bit Upset Vulnerability Factor (BUVF) to assess vulnerable portion of the eDRAM refresh cycle where the critical charge varies depending on the write voltage, storage and bit-line capacitance. This dissertation broaden the study on vulnerability assessment of LLC through investigating the impact of Process Variations (PV) on narrow resistive sensing margins in high-density NVM arrays, including on-chip cache and primary memory. Large-latency and power-hungry Sense Amplifers (SAs) have been adapted to combat PV in the past. Herein, a novel approach is proposed to leverage the PV in NVM arrays using Self-Organized Sub-bank (SOS) design. SOS engages the preferred SA alternative based on the intrinsic as-built behavior of the resistive sensing timing margin to reduce the latency and power consumption while maintaining acceptable access time. On the other hand, this dissertation investigates a novel technique to prioritize the service to 1) Extensive Read Reused Accessed blocks of the LLC that are silently dropped from higher levels of cache, and 2) the portion of the working set that may exhibit distant re-reference interval in L2. In particular, we develop a lightweight Multi-level Access History Profiler to effciently identify ERRA blocks through aggregating the LLC block addresses tagged with identical Most Signifcant Bits into a single entry. Experimental results indicate that the proposed technique can reduce the L2 read miss ratio by 51.7% on average across PARSEC and SPEC2006 workloads. In addition, this dissertation will broaden and apply advancements in theories of subspace recovery to pioneer computationally-aware in-situ operand reconstruction via the novel Logic In Interconnect (LI2) scheme. LI2 will be developed, validated, and re?ned both theoretically and experimentally to realize a radically different approach to post-Moore\u27s Law computing by leveraging low-rank matrices features offering data reconstruction instead of fetching data from main memory to reduce energy/latency cost per data movement. We propose LI2 enhancement to attain high performance delivery in the post-Moore\u27s Law era through equipping the contemporary micro-architecture design with a customized memory controller which orchestrates the memory request for fetching low-rank matrices to customized Fine Grain Reconfigurable Accelerator (FGRA) for reconstruction while the other memory requests are serviced as before. The goal of LI2 is to conquer the high latency/energy required to traverse main memory arrays in the case of LLC miss, by using in-situ construction of the requested data dealing with low-rank matrices. Thus, LI2 exchanges a high volume of data transfers with a novel lightweight reconstruction method under specific conditions using a cross-layer hardware/algorithm approach

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

STT-RAM을 이용한 에너지 효율적인 캐시 설계 기술

Author: 김남형
Publication venue: 서울대학교 대학원
Publication date: 01/02/2019
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2019. 2. 최기영.지난 수십 년간 '메모리 벽' 문제를 해결하기 위해 온 칩 캐시의 크기는 꾸준히 증가해왔다. 하지만 지금까지 캐시에 주로 사용되어 온 메모리 기술인 SRAM은 낮은 집적도와 높은 대기 전력 소모로 인해 큰 캐시를 구성하는 데에는 적합하지 않다. 이러한 SRAM의 단점을 보완하기 위해 더 높은 집적도와 낮은 대기 전력을 소모하는 새로운 메모리 기술인 STT-RAM으로 SRAM을 대체하는 것이 제안되었다. 하지만 STT-RAM은 데이터를 쓸 때 많은 에너지와 시간을 소비하기 때문에 단순히 SRAM을 STT-RAM으로 대체하는 것은 오히려 캐시 에너지 소비를 증가시킨다. 이러한 문제를 해결하기 위해 본 논문에서는 STT-RAM을 이용한 에너지 효율적인 캐시 설계 기술들을 제안한다. 첫 번째, 배타적 캐시 계층 구조에서 STT-RAM을 활용하는 방법을 제안하였다. 배타적 캐시 계층 구조는 계층 간에 중복된 데이터가 없기 때문에 포함적 캐시 계층 구조와 비교하여 더 큰 유효 용량을 갖지만, 배타적 캐시 계층 구조에서는 상위 레벨 캐시에서 내보내진 모든 데이터를 하위 레벨 캐시에 써야 하므로 더 많은 양의 데이터를 쓰게 된다. 이러한 배타적 캐시 계층 구조의 특성은 쓰기 특성이 단점인 STT-RAM을 함께 활용하는 것을 어렵게 한다. 이를 해결하기 위해 본 논문에서는 재사용 거리 예측을 기반으로 하는 SRAM/STT-RAM 하이브리드 캐시 구조를 설계하였다. 두 번째, 비휘발성 STT-RAM을 이용해 캐시를 설계할 때 고려해야 할 점들에 대해 분석하였다. STT-RAM의 비효율적인 쓰기 동작을 줄이기 위해 다양한 해결법들이 제안되었다. 그중 한 가지는 STT-RAM 소자가 데이터를 유지하는 시간을 줄여 (휘발성 STT-RAM) 쓰기 특성을 향상하는 방법이다. STT-RAM에 저장된 데이터를 잃는 것은 확률적으로 발생하기 때문에 저장된 데이터를 안정적으로 유지하기 위해서는 오류 정정 부호(ECC)를 이용해 주기적으로 오류를 정정해주어야 한다. 본 논문에서는 STT-RAM 모델을 이용하여 휘발성 STT-RAM 설계 요소들에 대해 분석하였고 실험을 통해 해당 설계 요소들이 캐시 에너지와 성능에 주는 영향을 보여주었다. 마지막으로, 매니코어 시스템에서의 분산 하이브리드 캐시 구조를 설계하였다. 단순히 기존의 하이브리드 캐시와 분산캐시를 결합하면 하이브리드 캐시의 효율성에 큰 영향을 주는 SRAM 활용도가 낮아진다. 따라서 기존의 하이브리드 캐시 구조에서의 에너지 감소를 기대할 수 없다. 본 논문에서는 분산 하이브리드 캐시 구조에서 SRAM 활용도를 높일 수 있는 두 가지 최적화 기술인 뱅크-내부 최적화와 뱅크간 최적화 기술을 제안하였다. 뱅크-내부 최적화는 highly-associative 캐시를 활용하여 뱅크 내부에서 쓰기 동작이 많은 데이터를 분산시키는 것이고 뱅크간 최적화는 서로 다른 캐시 뱅크에 쓰기 동작이 많은 데이터를 고르게 분산시키는 최적화 방법이다.Over the last decade, the capacity of on-chip cache is continuously increased to mitigate the memory wall problem. However, SRAM, which is a dominant memory technology for caches, is not suitable for such a large cache because of its low density and large static power. One way to mitigate these downsides of the SRAM cache is replacing SRAM with a more efficient memory technology. Spin-Transfer Torque RAM (STT-RAM), one of the emerging memory technology, is a promising candidate for the alternative of SRAM. As a substitute of SRAM, STT-RAM can compensate drawbacks of SRAM with its non-volatility and small cell size. However, STT-RAM has poor write characteristics such as high write energy and long write latency and thus simply replacing SRAM to STT-RAM increases cache energy. To overcome those poor write characteristics of STT-RAM, this dissertation explores three different design techniques for energy-efficient cache using STT-RAM. The first part of the dissertation focuses on combining STT-RAM with exclusive cache hierarchy. Exclusive caches are known to provide higher effective cache capacity than inclusive caches by removing duplicated copies of cache blocks across hierarchies. However, in exclusive cache hierarchies, every block evicted from the upper-level cache is written back to the last-level cache regardless of its dirtiness thereby incurring extra write overhead. This makes it challenging to use STT-RAM for exclusive last-level caches due to its high write energy and long write latency. To mitigate this problem, we design an SRAM/STT-RAM hybrid cache architecture based on reuse distance prediction. The second part of the dissertation explores trade-offs in the design of volatile STT-RAM cache. Due to the inefficient write operation of STT-RAM, various solutions have been proposed to tackle this inefficiency. One of the proposed solutions is redesigning STT-RAM cell for better write characteristics at the cost of shortened retention time (i.e., volatile STT-RAM). Since the retention failure of STT-RAM has a stochastic property, an extra overhead of periodic scrubbing with error correcting code (ECC) is required to tolerate the failure. With an analysis based on analytic STT-RAM model, we have conducted extensive experiments on various volatile STT-RAM cache design parameters including scrubbing period, ECC strength, and target failure rate. The experimental results show the impact of the parameter variations on last-level cache energy and performance and provide a guideline for designing a volatile STT-RAM with ECC and scrubbing. The last part of the dissertation proposes Benzene, an energy-efficient distributed SRAM/STT-RAM hybrid cache architecture for manycore systems running multiple applications. It is based on the observation that a naive application of hybrid cache techniques to distributed caches in a manycore architecture suffers from limited energy reduction due to uneven utilization of scarce SRAM. We propose two-level optimization techniques: intra-bank and inter-bank. Intra-bank optimization leverages highly-associative cache design, achieving more uniform distribution of writes within a bank. Inter-bank optimization evenly balances the amount of write-intensive data across the banks.Abstract i Contents iii List of Figures vii List of Tables xi Chapter 1 Introduction 1 1.1 Exclusive Last-Level Hybrid Cache 2 1.2 Designing Volatile STT-RAM Cache 4 1.3 Distributed Hybrid Cache 5 Chapter 2 Background 9 2.1 STT-RAM 9 2.1.1 Thermal Stability 10 2.1.2 Read and Write Operation of STT-RAM 11 2.1.3 Failures of STT-RAM 11 2.1.4 Volatile STT-RAM 13 2.1.5 Related Work 14 2.2 Exclusive Last-Level Hybrid Cache 18 2.2.1 Cache Hierarchies 18 2.2.2 Related Work 19 2.3 Distributed Hybrid Cache 21 2.3.1 Prediction Hybrid Cache 21 2.3.2 Distributed Cache Partitioning 22 2.3.3 Related Work 23 Chapter 3 Exclusive Last-Level Hybrid Cache 27 3.1 Motivation 27 3.1.1 Exclusive Cache Hierarchy 27 3.1.2 Reuse Distance 29 3.2 Architecture 30 3.2.1 Reuse Distance Predictor 30 3.2.2 Hybrid Cache Architecture 32 3.3 Evaluation 34 3.3.1 Methodology 34 3.3.2 LLC Energy Consumption 35 3.3.3 Main Memory Energy Consumption 38 3.3.4 Performance 39 3.3.5 Area Overhead 39 3.4 Summary 39 Chapter 4 Designing Volatile STT-RAM Cache 41 4.1 Analysis 41 4.1.1 Retention Failure of a Volatile STT-RAM Cell 41 4.1.2 Memory Array Design 43 4.2 Evaluation 45 4.2.1 Methodology 45 4.2.2 Last-Level Cache Energy 46 4.2.3 Performance 51 4.3 Summary 52 Chapter 5 Distributed Hybrid Cache 55 5.1 Motivation 55 5.2 Architecture 58 5.2.1 Intra-Bank Optimization 59 5.2.2 Inter-Bank Optimization 63 5.2.3 Other Optimizations 67 5.3 Evaluation Methodology 69 5.4 Evaluation Results 73 5.4.1 Energy Consumption and Performance 73 5.4.2 Analysis of Intra-bank Optimization 76 5.4.3 Analysis of Inter-bank Optimization 78 5.4.4 Impact of Inter-Bank Optimization on Network Energy 79 5.4.5 Sensitivity Analysis 80 5.4.6 Implementation Overhead 81 5.5 Summary 82 Chapter 6 Conculsion 85 Bibliography 88 초록 101Docto

SNU Open Repository and Archive

Volatile STT-RAM Scratchpad Design and Data Allocation for Low Energy

Author: Kandemir Mahmut
Rodríguez Gabriel
Touriño Juan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/12/2014
Field of study

[Abstract] On-chip power consumption is one of the fundamental challenges of current technology scaling. Cache memories consume a sizable part of this power, particularly due to leakage energy. STT-RAM is one of several new memory technologies that have been proposed in order to improve power while preserving performance. It features high density and low leakage, but at the expense of write energy and performance. This article explores the use of STT-RAM--based scratchpad memories that trade nonvolatility in exchange for faster and less energetically expensive accesses, making them feasible for on-chip implementation in embedded systems. A novel multiretention scratchpad partitioning is proposed, featuring multiple storage spaces with different retention, energy, and performance characteristics. A customized compiler-based allocation algorithm suitable for use with such a scratchpad organization is described. Our experiments indicate that a multiretention STT-RAM scratchpad can provide energy savings of 53% with respect to an iso-area, hardware-managed SRAM cache

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Performance impact of a slower main memory: a case study of STT-MRAM in HPC

Author: Asifuzzaman Kazi
Kwon Ohseong
Pavlovic Milan
Radojković Petar
Radulović Milan
Ryoo Kyung-Chang
Zaragoza David
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/10/2016
Field of study

In high-performance computing (HPC), significant effort is invested in research and development of novel memory technologies. One of them is Spin Transfer Torque Magnetic Random Access Memory (STT-MRAM) --- byte-addressable, high-endurance non-volatile memory with slightly higher access time than DRAM. In this study, we conduct a preliminary assessment of HPC system performance impact with STT-MRAM main memory with recent industry estimations. Reliable timing parameters of STT-MRAM devices are unavailable, so we also perform a sensitivity analysis that correlates overall system slowdown trend with respect to average device latency. Our results demonstrate that the overall system performance of large HPC clusters is not particularly sensitive to main-memory latency. Therefore, STT-MRAM, as well as any other emerging non-volatile memories with comparable density and access time, can be a viable option for future HPC memory system design.This work was supported by the Collaboration Agreement between Samsung Electronics Co., Ltd. and BSC, Spanish Government through Programa Severo Ochoa (SEV-2015-0493), by the Spanish Ministry of Science and Technology through TIN2015-65316-P project and by the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272). This work has also received funding from the European Union's Horizon 2020 research and innovation programme under ExaNoDe project (grant agreement No 671578).Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Enabling a reliable STT-MRAM main memory simulation

Author: Abe K.
Larry
Li Jianhua
Nebashi R.
Noguchi H.
Oh H.R.
Suresh A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/10/2017
Field of study

STT-MRAM is a promising new memory technology with very desirable set of properties such as non-volatility, byte-addressability and high endurance. It has the potential to become the universal memory that could be incorporated to all levels of memory hierarchy. Although STT-MRAM technology got significant attention of various major memory manufacturers, to this day, academic research of STT-MRAM main memory remains marginal. This is mainly due to the unavailability of publicly available detailed timing parameters which are required to perform a cycle accurate main memory simulation. Our study presents a detailed analysis of STT-MRAM main memory timing and propose an approach to perform a reliable system level simulation of the memory technology. We seamlessly incorporate STT-MRAM timing parameters into DRAMSim2 memory simulator and use it as a part of the simulation infrastructure of the high-performance computing (HPC) systems. Our results suggests that, STT-MRAM main memory would provide performance comparable to DRAM, while opening up various opportunities for HPC system improvements. Most importantly, our study enables researchers to conduct reliable system level research on STT-MRAM main memory, and to explore the opportunities that this technology has to offer.This work was supported by BSC, Spanish Government through Programa Severo Ochoa (SEV-2015-0493), by the Spanish Ministry of Science and Technology through TIN2015-65316-P project and by the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272). This work has also received funding from the European Union's Horizon 2020 research and innovation programme under ExaNoDe project (grant agreement No 671578). The authors wish to thank Terry Hulett, Duncan Bennett and Ben Cooke from Everspin Technologies Inc., for their technical support.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

High-Performance Energy-Efficient and Reliable Design of Spin-Transfer Torque Magnetic Memory

Author: Sayed Nour
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2020
Field of study

In this dissertation new computing paradigms, architectures and design philosophy are proposed and evaluated for adopting the STT-MRAM technology as highly reliable, energy efficient and fast memory. For this purpose, a novel cross-layer framework from the cell-level all the way up to the system- and application-level has been developed. In these framework, the reliability issues are modeled accurately with appropriate fault models at different abstraction levels in order to analyze the overall failure rates of the entire memory and its Mean Time To Failure (MTTF) along with considering the temperature and process variation effects. Design-time, compile-time and run-time solutions have been provided to address the challenges associated with STT-MRAM. The effectiveness of the proposed solutions is demonstrated in extensive experiments that show significant improvements in comparison to state-of-the-art solutions, i.e. lower-power, higher-performance and more reliable STT-MRAM design

KITopen