Search CORE

152 research outputs found

Energy Modeling of Machine Learning Algorithms on General Purpose Hardware

Author
Publication venue
Publication date: 01/01/2018
Field of study

abstract: Articial Neural Network(ANN) has become a for-bearer in the field of Articial Intel- ligence. The innovations in ANN has led to ground breaking technological advances like self-driving vehicles,medical diagnosis,speech Processing,personal assistants and many more. These were inspired by evolution and working of our brains. Similar to how our brain evolved using a combination of epigenetics and live stimulus,ANN require training to learn patterns.The training usually requires a lot of computation and memory accesses. To realize these systems in real embedded hardware many Energy/Power/Performance issues needs to be solved. The purpose of this research is to focus on methods to study data movement requirement for generic Neural Net- work along with the energy associated with it and suggest some ways to improve the design.Many methods have suggested ways to optimize using mix of computation and data movement solutions without affecting task accuracy. But these methods lack a computation model to calculate the energy and depend on mere back of the envelope calculation. We realized that there is a need for a generic quantitative analysis for memory access energy which helps in better architectural exploration. We show that the present architectural tools are either incompatible or too slow and we need a better analytical method to estimate data movement energy. We also propose a simplistic yet effective approach that is robust and expandable by users to support various systems.Dissertation/ThesisMasters Thesis Electrical Engineering 201

ASU Digital Repository

Reducing main memory access latency through SDRAM address mapping techniques and access reordering mechanisms

Author: Shao Jun
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/01/2006
Field of study

As the performance gap between microprocessors and memory continues to increase, main memory accesses result in long latencies which become a factor limiting system performance. Previous studies show that main memory access streams contain significant localities and SDRAM devices provide parallelism through multiple banks and channels. These locality and parallelism have not been exploited thoroughly by conventional memory controllers. In this thesis, SDRAM address mapping techniques and memory access reordering mechanisms are studied and applied to memory controller design with the goal of reducing observed main memory access latency. The proposed bit-reversal address mapping attempts to distribute main memory accesses evenly in the SDRAM address space to enable bank parallelism. As memory accesses to unique banks are interleaved, the access latencies are partially hidden and therefore reduced. With the consideration of cache conflict misses, bit-reversal address mapping is able to direct potential row conflicts to different banks, further improving the performance. The proposed burst scheduling is a novel access reordering mechanism, which creates bursts by clustering accesses directed to the same rows of the same banks. Subjected to a threshold, reads are allowed to preempt writes and qualified writes are piggybacked at the end of the bursts. A sophisticated access scheduler selects accesses based on priorities and interleaves accesses to maximize the SDRAM data bus utilization. Consequentially burst scheduling reduces row conflict rate, increasing and exploiting the available row locality. Using a revised SimpleScalar and M5 simulator, both techniques are evaluated and compared with existing academic and industrial solutions. With SPEC CPU2000 benchmarks, bit-reversal reduces the execution time by 14% on average over traditional page interleaving address mapping. Burst scheduling also achieves a 15% reduction in execution time over conventional bank in order scheduling. Working constructively together, bit-reversal and burst scheduling successfully achieve a 19% speedup across simulated benchmarks

Michigan Technological University

타임 윈도우 카운터를 활용한 로우 해머링 방지 및 주기억장치 성능 향상

Author: 이어진
Publication venue: 서울대학교 대학원
Publication date: 01/08/2020
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 융합과학기술대학원 융합과학부(지능형융합시스템전공), 2020. 8. 안정호.Computer systems using DRAM are exposed to row-hammer (RH) attacks, which can flip data in a DRAM row without directly accessing a row but by frequently activating its adjacent ones. There have been a number of proposals to prevent RH, including both probabilistic and deterministic solutions. However, the probabilistic solutions provide protection with no capability to detect attacks and have a non-zero probability for missing protection. Otherwise, counter-based deterministic solutions either incur large area overhead or suffer from noticeable performance drop on adversarial memory access patterns. To overcome these challenges, we propose a new counter-based RH prevention solution named Time Window Counter (TWiCe) based row refresh, which accurately detects potential RH attacks only using a small number of counters with a minimal performance impact. We first make a key observation that the number of rows that can cause RH is limited by the maximum values of row activation frequency and DRAM cell retention time. We calculate the maximum number of required counter entries per DRAM bank, with which TWiCe prevents RH with a strong deterministic guarantee. TWiCe incurs no performance overhead on normal DRAM operations and less than 0.7% area and energy overheads over contemporary DRAM devices. Our evaluation shows that TWiCe makes no more than 0.006% of additional DRAM row activations for adversarial memory access patterns, including RH attack scenarios. To reduce the area and energy overhead further, we propose the threshold adjusted rank-level TWiCe. We first introduce pseudo-associative TWiCe (pa-TWiCe) that can search for hundreds of TWiCe table entries energy-efficiently. In addition, by exploiting pa-TWiCe structure, we propose rank-level TWiCe that reduces the number of required entries further by managing the table entries at a rank-level. We also adjust the thresholds of TWiCe to reduce the number of entries without the increase of false-positive detection on general workloads. Finally, we propose extend TWiCe as a hot-page detector to improve main-memory performance. TWiCe table contains the row addresses that have been frequently activated recently, and they are likely to be activated again due to temporal locality in memory accesses. We show how the hot-page detection in TWiCe can be combined with a DRAM page swap methodology to reduce the DRAM latency for the hot pages. Also, our evaluation shows that low-latency DRAM using TWiCe achieves up to 12.2% IPC improvement over a baseline DDR4 device for a multi-threaded workload.DRAM을 주기억장치로 사용하는 컴퓨터 시스템은 로우 해머링 공격에 노출된다. 로우 해머링은 인접 DRAM 로우를 자주 activation함으로써 특정 DRAM 로우 데이터에 직접 접근하지 않고서도 데이터를 뒤집을 수 있는 현상을 말한다. 이러한 로우 해머링 현상을 방지하기 위해 여러가지 확률적인 방지 기법과 결정론적 방지 기법들이 연구되어 왔다. 그러나, 확률적인 방지 기법은 공격 자체를 탐지할 수 없고, 방지에 실패할 확률이 0이 아니라는 한계가 있다. 또한 기존의 카운터를 활용한 결정론적 방지 기법들은 큰 칩 면적 비용을 발생시키거나 특정 메모리 접근 패턴에서 현저한 성능 하락을 야기한다는 단점이 있다. 이러한 문제를 해결하기 위해, 우리는 TWiCe (Time Window Counter based row refresh)라는 새로운 카운터 기반 결정론적 방지 기법을 제안한다. TWiCe는 적은 수의 카운터를 활용하여 로우 해머링 공격을 정확하게 탐지하면서도 성능에 악영향을 최소화하는 방법이다. 우리는 DRAM 타이밍 파라미터에 의해 로우 activation 빈도가 제한되고 DRAM 셀이 주기적으로 리프레시 되기 때문에 로우 해머링을 야기할 수 있는 DRAM 로우의 수가 한정된다는 사실에 주목하였다. 이로부터 우리는 TWiCe가 확실한 결정론적 방지를 보장할 경우 필요한 DRAM 뱅크 당 필요한 카운터 수의 최대값을 구하였다. TWiCe는 일반적인 DRAM 동작 과정에서는 성능에 아무런 영향을 미치지 않으며, 현대 DRAM 디바이스에서 0.7% 이하의 칩 면적 증가 및 에너지 증가만을 필요로 한다. 우리가 진행한 평가에서 TWiCe는 로우 해머링 공격 시나리오를 포함한 여러가지 메모리 접근 패턴에서 0.006% 이하의 추가적인 DRAM activation을 요구하였다. 또한 TWiCe의 칩 면적 및 에너지 비용을 더욱 줄이기 위하여, 우리는 threshold가 조정된 랭크 단위 TWiCe를 제안한다. 먼저, 수백개가 넘는 TWiCe 테이블 항목 검색을 에너지 효율적으로 수행할 수 있는 pa-TWiCe (pseudo-associatvie TWiCe)를 제안하였다. 그리고, 테이블 항목을 랭크 단위로 관리하여 필요한 테이블 항목의 수를 더욱 줄인 랭크 단위 TWiCe를 제안하였다. 또한, 우리는 TWiCe의 threshold 값을 조절함으로써 일반적인 워크로드 상에서 거짓 양성(false-positive) 탐지를 증가시키지 않는 선에서 TWiCe의 테이블 항목 수를 더욱 줄였다. 마지막으로, 우리는 컴퓨터 시스템의 주기억장치 성능 향상을 위해 TWiCe를 hot-page 감지기로 사용하는 것을 제안한다. 메모리 접근의 시간적 지역성에 의해 최근 자주 activation된 DRAM 로우들은 다시 activation될 확률이 높고, TWiCe는 최근 자주 activation된 DRAM 로우에 대한 정보를 가지고 있다. 이러한 사실에 기반하여, 우리는 hot-page에 대한 DRAM 접근 지연시간을 줄이는 DRAM 페이지 스왑(swap) 기법들에 TWiCe를 적용하는 방법을 보인다. 우리가 수행한 평가에서 TWiCe를 사용한 저지연시간 DRAM은 멀티 쓰레딩 워크로드들에서 기존 DDR4 디바이스 대비 IPC를 최대 12.2% 증가시켰다.Introduction 1 1.1 Time Window Counter Based Row Refresh to Prevent Row-hammering 2 1.2 Optimizing Time Window Counter 6 1.3 Using Time Window Counters to Improve Main Memory Performance 8 1.4 Outline 10 Background of DRAM and Row-hammering 11 2.1 DRAM Device Organization 12 2.2 Sparing DRAM Rows to Combat Reliability Challenges 13 2.3 Main Memory Subsystem Organization and Operation 14 2.4 Row-hammering (RH) 18 2.5 Previous RH Prevention Solutions 20 2.6 Limitations of the Previous RH Solutions 21 TWiCe: Time Window Counter based RH Prevention 26 3.1 TWiCe: Time Window Counter 26 3.2 Proof of RH Prevention 30 3.3 Counter Table Size 33 3.4 Architecting TWiCe 35 3.4.1 Location of TWiCe Table 35 3.4.2 Augmenting DRAM Interface with a New Adjacent Row Refresh (ARR) Command 37 3.5 Analysis 40 3.6 Evaluation 42 Optimizing TWiCe to Reduce Implementation Cost 47 4.1 Pseudo-associative TWiCe 47 4.2 Rank-level TWiCe 50 4.3 Adjusting Threshold to Reduce Table Size 55 4.4 Analysis 57 4.5 Evaluation 59 Augmenting TWiCe for Hot-page Detection 62 5.1 Necessity of Counters for Detecting Hot Pages 62 5.2 Previous Studies on Migration for Asymmetric Low-latency DRAM 64 5.3 Extending TWiCe for Dynamic Hot-page Detection 67 5.4 Additional Components and Methodology 70 5.5 Analysis and Evaluation 73 5.5.1 Overhead Analysis 73 5.5.2 Evaluation 75 Conclusion 82 6.1 Future work 84 Bibliography 85 국문초록 94Docto

SNU Open Repository and Archive

TRRespass: Exploiting the Many Sides of Target Row Refresh

Author: Bos Herbert
Frigo Pietro
Giuffrida Cristiano
Hassan Hasan
Mutlu Onur
Razavi Kaveh
van der Veen Victor
Vannacci Emanuele
Publication venue
Publication date: 03/04/2020
Field of study

After a plethora of high-profile RowHammer attacks, CPU and DRAM vendors scrambled to deliver what was meant to be the definitive hardware solution against the RowHammer problem: Target Row Refresh (TRR). A common belief among practitioners is that, for the latest generation of DDR4 systems that are protected by TRR, RowHammer is no longer an issue in practice. However, in reality, very little is known about TRR. In this paper, we demystify the inner workings of TRR and debunk its security guarantees. We show that what is advertised as a single mitigation mechanism is actually a series of different solutions coalesced under the umbrella term TRR. We inspect and disclose, via a deep analysis, different existing TRR solutions and demonstrate that modern implementations operate entirely inside DRAM chips. Despite the difficulties of analyzing in-DRAM mitigations, we describe novel techniques for gaining insights into the operation of these mitigation mechanisms. These insights allow us to build TRRespass, a scalable black-box RowHammer fuzzer. TRRespass shows that even the latest generation DDR4 chips with in-DRAM TRR, immune to all known RowHammer attacks, are often still vulnerable to new TRR-aware variants of RowHammer that we develop. In particular, TRRespass finds that, on modern DDR4 modules, RowHammer is still possible when many aggressor rows are used (as many as 19 in some cases), with a method we generally refer to as Many-sided RowHammer. Overall, our analysis shows that 13 out of the 42 modules from all three major DRAM vendors are vulnerable to our TRR-aware RowHammer access patterns, and thus one can still mount existing state-of-the-art RowHammer attacks. In addition to DDR4, we also experiment with LPDDR4 chips and show that they are susceptible to RowHammer bit flips too. Our results provide concrete evidence that the pursuit of better RowHammer mitigations must continue.Comment: 16 pages, 16 figures, in proceedings IEEE S&P 202

arXiv.org e-Print Archive

VU Research Portal

Crossref

Energy-Aware Data Movement In Non-Volatile Memory Hierarchies

Author: Najafabadi Navid Khoshavi
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2017
Field of study

While technology scaling enables increased density for memory cells, the intrinsic high leakage power of conventional CMOS technology and the demand for reduced energy consumption inspires the use of emerging technology alternatives such as eDRAM and Non-Volatile Memory (NVM) including STT-MRAM, PCM, and RRAM. The utilization of emerging technology in Last Level Cache (LLC) designs which occupies a signifcant fraction of total die area in Chip Multi Processors (CMPs) introduces new dimensions of vulnerability, energy consumption, and performance delivery. To be specific, a part of this research focuses on eDRAM Bit Upset Vulnerability Factor (BUVF) to assess vulnerable portion of the eDRAM refresh cycle where the critical charge varies depending on the write voltage, storage and bit-line capacitance. This dissertation broaden the study on vulnerability assessment of LLC through investigating the impact of Process Variations (PV) on narrow resistive sensing margins in high-density NVM arrays, including on-chip cache and primary memory. Large-latency and power-hungry Sense Amplifers (SAs) have been adapted to combat PV in the past. Herein, a novel approach is proposed to leverage the PV in NVM arrays using Self-Organized Sub-bank (SOS) design. SOS engages the preferred SA alternative based on the intrinsic as-built behavior of the resistive sensing timing margin to reduce the latency and power consumption while maintaining acceptable access time. On the other hand, this dissertation investigates a novel technique to prioritize the service to 1) Extensive Read Reused Accessed blocks of the LLC that are silently dropped from higher levels of cache, and 2) the portion of the working set that may exhibit distant re-reference interval in L2. In particular, we develop a lightweight Multi-level Access History Profiler to effciently identify ERRA blocks through aggregating the LLC block addresses tagged with identical Most Signifcant Bits into a single entry. Experimental results indicate that the proposed technique can reduce the L2 read miss ratio by 51.7% on average across PARSEC and SPEC2006 workloads. In addition, this dissertation will broaden and apply advancements in theories of subspace recovery to pioneer computationally-aware in-situ operand reconstruction via the novel Logic In Interconnect (LI2) scheme. LI2 will be developed, validated, and re?ned both theoretically and experimentally to realize a radically different approach to post-Moore\u27s Law computing by leveraging low-rank matrices features offering data reconstruction instead of fetching data from main memory to reduce energy/latency cost per data movement. We propose LI2 enhancement to attain high performance delivery in the post-Moore\u27s Law era through equipping the contemporary micro-architecture design with a customized memory controller which orchestrates the memory request for fetching low-rank matrices to customized Fine Grain Reconfigurable Accelerator (FGRA) for reconstruction while the other memory requests are serviced as before. The goal of LI2 is to conquer the high latency/energy required to traverse main memory arrays in the case of LLC miss, by using in-situ construction of the requested data dealing with low-rank matrices. Thus, LI2 exchanges a high volume of data transfers with a novel lightweight reconstruction method under specific conditions using a cross-layer hardware/algorithm approach

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)