At the Locus of Performance: A Case Study in Enhancing CPUs with Copious
  3D-Stacked Cache

Chen, Peng; Domke, Jens; Drozd, Aleksandr; Gerofi, Balazs; Kodama, Yuetsu; Matsuoka, Satoshi; Mittal, Sparsh; Pericàs, Miquel; Podobas, Artur; Vatai, Emil; Wahib, Mohamed; Zhang, Lingqi

At the Locus of Performance: A Case Study in Enhancing CPUs with Copious 3D-Stacked Cache

Authors: Peng Chen
Jens Domke
Aleksandr Drozd
Balazs Gerofi
Yuetsu Kodama
Satoshi Matsuoka
Sparsh Mittal
Miquel Pericàs
Artur Podobas
Emil Vatai
Mohamed Wahib
Lingqi Zhang
Publication date: 5 April 2022
Publisher

Abstract

Over the last three decades, innovations in the memory subsystem were primarily targeted at overcoming the data movement bottleneck. In this paper, we focus on a specific market trend in memory technology: 3D-stacked memory and caches. We investigate the impact of extending the on-chip memory capabilities in future HPC-focused processors, particularly by 3D-stacked SRAM. First, we propose a method oblivious to the memory subsystem to gauge the upper-bound in performance improvements when data movement costs are eliminated. Then, using the gem5 simulator, we model two variants of LARC, a processor fabricated in 1.5 nm and enriched with high-capacity 3D-stacked cache. With a volume of experiments involving a board set of proxy-applications and benchmarks, we aim to reveal where HPC CPU performance could be circa 2028, and conclude an average boost of 9.77x for cache-sensitive HPC applications, on a per-chip basis. Additionally, we exhaustively document our methodological exploration to motivate HPC centers to drive their own technological agenda through enhanced co-design

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2204.02235

Last time updated on 26/04/2022