Over the last three decades, innovations in the memory subsystem were
primarily targeted at overcoming the data movement bottleneck. In this paper,
we focus on a specific market trend in memory technology: 3D-stacked memory and
caches. We investigate the impact of extending the on-chip memory capabilities
in future HPC-focused processors, particularly by 3D-stacked SRAM. First, we
propose a method oblivious to the memory subsystem to gauge the upper-bound in
performance improvements when data movement costs are eliminated. Then, using
the gem5 simulator, we model two variants of LARC, a processor fabricated in
1.5 nm and enriched with high-capacity 3D-stacked cache. With a volume of
experiments involving a board set of proxy-applications and benchmarks, we aim
to reveal where HPC CPU performance could be circa 2028, and conclude an
average boost of 9.77x for cache-sensitive HPC applications, on a per-chip
basis. Additionally, we exhaustively document our methodological exploration to
motivate HPC centers to drive their own technological agenda through enhanced
co-design