23 research outputs found
Demystifying the Performance of HPC Scientific Applications on NVM-based Memory Systems
The emergence of high-density byte-addressable non-volatile memory (NVM) is
promising to accelerate data- and compute-intensive applications. Current NVM
technologies have lower performance than DRAM and, thus, are often paired with
DRAM in a heterogeneous main memory. Recently, byte-addressable NVM hardware
becomes available. This work provides a timely evaluation of representative HPC
applications from the "Seven Dwarfs" on NVM-based main memory. Our results
quantify the effectiveness of DRAM-cached-NVM for accelerating HPC applications
and enabling large problems beyond the DRAM capacity. On uncached-NVM, HPC
applications exhibit three tiers of performance sensitivity, i.e., insensitive,
scaled, and bottlenecked. We identify write throttling and concurrency control
as the priorities in optimizing applications. We highlight that concurrency
change may have a diverging effect on read and write accesses in applications.
Based on these findings, we explore two optimization approaches. First, we
provide a prediction model that uses datasets from a small set of
configurations to estimate performance at various concurrency and data sizes to
avoid exhaustive search in the configuration space. Second, we demonstrate that
write-aware data placement on uncached-NVM could achieve x performance
improvement with a 60% reduction in DRAM usage.Comment: 34th IEEE International Parallel and Distributed Processing Symposium
(IPDPS2020
CXL Memory as Persistent Memory for Disaggregated HPC: A Practical Approach
In the landscape of High-Performance Computing (HPC), the quest for efficient
and scalable memory solutions remains paramount. The advent of Compute Express
Link (CXL) introduces a promising avenue with its potential to function as a
Persistent Memory (PMem) solution in the context of disaggregated HPC systems.
This paper presents a comprehensive exploration of CXL memory's viability as a
candidate for PMem, supported by physical experiments conducted on cutting-edge
multi-NUMA nodes equipped with CXL-attached memory prototypes. Our study not
only benchmarks the performance of CXL memory but also illustrates the seamless
transition from traditional PMem programming models to CXL, reinforcing its
practicality.
To substantiate our claims, we establish a tangible CXL prototype using an
FPGA card embodying CXL 1.1/2.0 compliant endpoint designs (Intel FPGA CXL IP).
Performance evaluations, executed through the STREAM and STREAM-PMem
benchmarks, showcase CXL memory's ability to mirror PMem characteristics in
App-Direct and Memory Mode while achieving impressive bandwidth metrics with
Intel 4th generation Xeon (Sapphire Rapids) processors.
The results elucidate the feasibility of CXL memory as a persistent memory
solution, outperforming previously established benchmarks. In contrast to
published DCPMM results, our CXL-DDR4 memory module offers comparable bandwidth
to local DDR4 memory configurations, albeit with a moderate decrease in
performance. The modified STREAM-PMem application underscores the ease of
transitioning programming models from PMem to CXL, thus underscoring the
practicality of adopting CXL memory.Comment: 12 pages, 9 figure
The Case for Non-Volatile RAM in Cloud HPCaaS
HPC as a service (HPCaaS) is a new way to expose HPC resources via cloud
services. However, continued effort to port large-scale tightly coupled
applications with high interprocessor communication to multiple (and many)
nodes synchronously, as in on-premise supercomputers, is still far from
satisfactory due to network latencies. As a consequence, in said cases, HPCaaS
is recommended to be used with one or few instances. In this paper we take the
claim that new piece of memory hardware, namely Non-Volatile RAM (NVRAM), can
allow such computations to scale up to an order of magnitude with marginalized
penalty in comparison to RAM. Moreover, we suggest that the introduction of
NVRAM to HPCaaS can be cost-effective to the users and the suppliers in
numerous forms.Comment: 4 page
Extending Memory Capacity in Consumer Devices with Emerging Non-Volatile Memory: An Experimental Study
The number and diversity of consumer devices are growing rapidly, alongside
their target applications' memory consumption. Unfortunately, DRAM scalability
is becoming a limiting factor to the available memory capacity in consumer
devices. As a potential solution, manufacturers have introduced emerging
non-volatile memories (NVMs) into the market, which can be used to increase the
memory capacity of consumer devices by augmenting or replacing DRAM. Since
entirely replacing DRAM with NVM in consumer devices imposes large system
integration and design challenges, recent works propose extending the total
main memory space available to applications by using NVM as swap space for
DRAM. However, no prior work analyzes the implications of enabling a real
NVM-based swap space in real consumer devices.
In this work, we provide the first analysis of the impact of extending the
main memory space of consumer devices using off-the-shelf NVMs. We extensively
examine system performance and energy consumption when the NVM device is used
as swap space for DRAM main memory to effectively extend the main memory
capacity. For our analyses, we equip real web-based Chromebook computers with
the Intel Optane SSD, which is a state-of-the-art low-latency NVM-based SSD
device. We compare the performance and energy consumption of interactive
workloads running on our Chromebook with NVM-based swap space, where the Intel
Optane SSD capacity is used as swap space to extend main memory capacity,
against two state-of-the-art systems: (i) a baseline system with double the
amount of DRAM than the system with the NVM-based swap space; and (ii) a system
where the Intel Optane SSD is naively replaced with a state-of-the-art (yet
slower) off-the-shelf NAND-flash-based SSD, which we use as a swap space of
equivalent size as the NVM-based swap space
????????? ????????? ??????????????? ?????? ????????? ?????? ???????????? ????????? ?????? ??????
Department of Computer Science and EngineeringHigh-capacity non-volatile memory is the new main memory. NVM provides up to 8x the memory capacity of DRAM, but can reduce bandwidth by up to 7x and increase latency by up to 2x. In case of using NVM alone, it provides a large capacity but has the disadvantage of low performance, so a system that is used with DRAM is used. However, if the two mem- ories are not managed properly, the performance will be as bad as if NVM is used alone. A lot of optimization work is being done in the most studied tiered memory system to use the two memories. We found that before Intel Optane DC Persistent Memory ????DCPMM???? was com- mercialized, memory systems using both DRAM and NVM memory did not take DCPMM????s performance into consideration.
We present High Probability Write Patterns ????HPWP???????? an optimization policy for tiered mem- ory systems, in consideration of the commercialized DCPMM performance. HPWP prevents DCPMM from generating write operations as much as possible through the fact that write per- formance of DCPMM is three times worse than read performance. In a tiered memory system equipped with DCPMM, HPWP provides up to 19% performance improvement in key-value store compared to previous studies.ope