265 research outputs found
Evaluating the Potential of Disaggregated Memory Systems for HPC applications
Disaggregated memory is a promising approach that addresses the limitations
of traditional memory architectures by enabling memory to be decoupled from
compute nodes and shared across a data center. Cloud platforms have deployed
such systems to improve overall system memory utilization, but performance can
vary across workloads. High-performance computing (HPC) is crucial in
scientific and engineering applications, where HPC machines also face the issue
of underutilized memory. As a result, improving system memory utilization while
understanding workload performance is essential for HPC operators. Therefore,
learning the potential of a disaggregated memory system before deployment is a
critical step. This paper proposes a methodology for exploring the design space
of a disaggregated memory system. It incorporates key metrics that affect
performance on disaggregated memory systems: memory capacity, local and remote
memory access ratio, injection bandwidth, and bisection bandwidth, providing an
intuitive approach to guide machine configurations based on technology trends
and workload characteristics. We apply our methodology to analyze thirteen
diverse workloads, including AI training, data analysis, genomics, protein,
fusion, atomic nuclei, and traditional HPC bookends. Our methodology
demonstrates the ability to comprehend the potential and pitfalls of a
disaggregated memory system and provides motivation for machine configurations.
Our results show that eleven of our thirteen applications can leverage
injection bandwidth disaggregated memory without affecting performance, while
one pays a rack bisection bandwidth penalty and two pay the system-wide
bisection bandwidth penalty. In addition, we also show that intra-rack memory
disaggregation would meet the application's memory requirement and provide
enough remote memory bandwidth.Comment: The submission builds on the following conference paper: N. Ding, S.
Williams, H.A. Nam, et al. Methodology for Evaluating the Potential of
Disaggregated Memory Systems,2nd International Workshop on RESource
DISaggregation in High-Performance Computing (RESDIS), November 18, 2022. It
is now submitted to the CCPE journal for revie
A Hybrid In Situ Approach for Cost Efficient Image Database Generation
The visualization of results while the simulation is running is increasingly common in extreme scale computing environments. We present a novel approach for in situ generation of image databases to achieve cost savings on supercomputers. Our approach, a hybrid between traditional inline and in transit techniques, dynamically distributes visualization tasks between simulation nodes and visualization nodes, using probing as a basis to estimate rendering cost. Our hybrid design differs from previous works in that it creates opportunities to minimize idle time from four fundamental types of inefficiency: variability, limited scalability, overhead, and rightsizing. We demonstrate our results by comparing our method against both inline and in transit methods for a variety of configurations, including two simulation codes and a scaling study that goes above 19K cores. Our findings show that our approach is superior in many configurations. As in situ visualization becomes increasingly ubiquitous, we believe our technique could lead to significant amounts of reclaimed cycles on supercomputers.</p
- …