Search CORE

1,037 research outputs found

LLM: Realizing Low-Latency Memory by Exploiting Embedded Silicon Photonics for Irregular Workloads

Author: Akella V.
Fariborz M.
Fotouhi P.
Lowe-Power J.
Palermo S.
Proietti R.
Samani M.
Yi I. -M.
Yoo S. J. B.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

As emerging workloads exhibit irregular memory access patterns with poor data reuse and locality, they would benefit from a DRAM that achieves low latency without sacrificing bandwidth and energy efficiency. We propose LLM (Low Latency Memory), a codesign of the DRAM microarchitecture, the memory controller and the LLC/DRAM interconnect by leveraging embedded silicon photonics in 2.5D/3D integrated system on chip. LLM relies on Wavelength Division Multiplexing (WDM)-based photonic interconnects to reduce the contention throughout the memory subsystem. LLM also increases the bank-level parallelism, eliminates bus conflicts by using dedicated optical data paths, and reduces the access energy per bit with shorter global bitlines and smaller row buffers. We evaluate the design space of LLM for a variety of synthetic benchmarks and representative graph workloads on a full-system simulator (gem5). LLM exhibits low memory access latency for traffics with both regular and irregular access patterns. For irregular traffic, LLM achieves high bandwidth utilization (over 80% peak throughput compared to 20% of HBM2.0). For real workloads, LLM achieves 3 × and 1.8 × lower execution time compared to HBM2.0 and a state-of-the-art memory system with high memory level parallelism, respectively. This study also demonstrates that by reducing queuing on the data path, LLM can achieve on average 3.4 × lower memory latency variation compared to HBM2.0

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

3D 적층 DRAM을 위한 실용적인 Partial Row Activation 및 딥 러닝 워크로드에의 적용

Author: 김남호
Publication venue: 서울대학교 대학원
Publication date: 01/02/2019
Field of study

학위논문 (석사)-- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2019. 2. 이재욱.GPUs are widely used to run deep learning applications. Today's high-end GPUs adopt 3D stacked DRAM technologies like High-Bandwidth Memory (HBM) to provide massive bandwidth, which consumes lots of power. Thousands of concurrent threads on GPU cause frequent row buffer conflicts to waste a significant amount of DRAM energy. To reduce this waste we propose a practical partial row activation scheme for 3D stacked DRAM. Exploiting the latency tolerance of deep learning workloads with abundant memory-level parallelism, we trade DRAM latency for energy savings. The proposed design demonstrates substantial savings of DRAM activation energy with minimal performance degradation for both the deep learning and other conventional GPU workloads. This benefit comes with a very low area cost and only minimal adjustments of DRAM timing parameters to the standard HBM2 DRAM interface.GPU는 심층 학습 애플리케이션을 실행하는 데 널리 사용된다. 오늘날의 high-end GPU는 HBM (High-Bandwidth Memory)과 같은 3D 적층 DRAM 기술을 채택하여 엄청난 대역폭을 제공하므로 많은 전력을 소비한다. GPU에서 수천 개의 동시 스레드가 발생하면 빈번한 row buffer conflict로 인해 상당한 양의 DRAM 에너지가 낭비된다. 이러한 낭비를 줄이기 위해 3D 적층 DRAM에 대한 partial row activation 기법을 제안한다. 풍부한 memory-level parallelism 이 있는 딥 러닝 워크 로드의 latency tolerance를 활용해서, DRAM latency를 지불하고 에너지 절감을 얻을 수 있다. 본 제안에서 딥 러닝 및 기타 기존 GPU 워크 로드에서 성능 저하를 최소화하면서 DRAM activation energy의 상당한 절감 효과를 보여준다. 본 제안은 매우 낮은 면적 비용으로 표준 HBM2 DRAM 인터페이스에 대한 DRAM 타이밍의 최소한의 변경만으로 구현할 수 있다는 장점이 있다.Abstract i Contents iv Chapter 1Introduction 1 Chapter 2 Background and Motivation 4 2.1 Deep Learning Workloads 4 2.2 DRAM Access Patterns on GPU 7 2.3 Partial Row Activation 9 2.4 Performance/Area Trade-off in Partial Activation 10 2.5 Latency-Tolerance of Deep Learning Workload on GPU 11 Chapter 3 Practical Partial Row Activation 13 3.1 Overview 13 3.2 BankStructure 13 3.3 DelayedActivation 17 Chapter 4 Evaluation 19 4.1 Methodology 19 4.2 EnergyImprovement 21 4.3 Performance Degradation 22 4.4 AreaOverhead 24 Chapter 5 Conclusion 25 Bibliography 26 국문초록 30 Acknowledgments 31Maste

SNU Open Repository and Archive