Search CORE

10 research outputs found

Demystifying the Performance of HPC Scientific Applications on NVM-based Memory Systems

Author: Gokhale Maya
Li Dong
Peng Ivy
Ren Jie
Wu Kai
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/02/2020
Field of study

The emergence of high-density byte-addressable non-volatile memory (NVM) is promising to accelerate data- and compute-intensive applications. Current NVM technologies have lower performance than DRAM and, thus, are often paired with DRAM in a heterogeneous main memory. Recently, byte-addressable NVM hardware becomes available. This work provides a timely evaluation of representative HPC applications from the "Seven Dwarfs" on NVM-based main memory. Our results quantify the effectiveness of DRAM-cached-NVM for accelerating HPC applications and enabling large problems beyond the DRAM capacity. On uncached-NVM, HPC applications exhibit three tiers of performance sensitivity, i.e., insensitive, scaled, and bottlenecked. We identify write throttling and concurrency control as the priorities in optimizing applications. We highlight that concurrency change may have a diverging effect on read and write accesses in applications. Based on these findings, we explore two optimization approaches. First, we provide a prediction model that uses datasets from a small set of configurations to estimate performance at various concurrency and data sizes to avoid exhaustive search in the configuration space. Second, we demonstrate that write-aware data placement on uncached-NVM could achieve

2

x performance improvement with a 60% reduction in DRAM usage.Comment: 34th IEEE International Parallel and Distributed Processing Symposium (IPDPS2020

arXiv.org e-Print Archive

Crossref

Recommended from our members

Building Distributed Systems with Non-Volatile Main Memories and RDMA Networks

Author: Yang Jian
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

High-performance, byte-addressable non-volatile main memories (NVMMs) allow application developers to combine storage and memory into a single layer. These high-performance storage systems would be especially useful in large-scale data center environments where data is distributed and replicated across multiple servers.Unfortunately, existing approaches of providing remote storage access rest on the assumption that storage is slow, so the cost of the software and protocols is acceptable. Such assumption no longer holds for the fast NVMM. As a result, taking full advantage of NVMMs’ potential will require changes in system software and networking protocol. This thesis focuses on accessing remote NVMM efficiently using remote direct memory access (RDMA) network. RDMA enables a client to directly access memory on a remote machine without involving its local CPU.This thesis first presents Mojim, a system that provides replicated, reliable, and highly-available NVMM as an operating system service. Applications can access data in Mojim using normal load and store instructions while controlling when and how updates propagate to replicas using system calls. Our evaluation shows Mojim adds little overhead to the un-replicated system and provides 0.4x to 2.7x the throughput of the un-replicated system.This thesis then presents Orion, a distributed file system designed from for NVMM and RDMA networks. Traditional distributed file systems are designed for slower hard drives. These slower media incentivizes complex optimizations (e.g., queuing, striping, and batching) around disk accesses. Orion combines file system functions and network operations into a single layer. It provides low latency metadata accesses and outperforms existing distributed file systems by a large margin.Finally, an NVMM application can map files backed by an NVMM file system into its address space, and accesses them using CPU instructions. In this case, RDMA and NVMM file systems introduce duplication of effort on permissions, naming, and address translation. We introduce two changes to the existing RDMA protocol: the file memory region (FileMR) and range based address translation. By eliminating redundant translations, FileMR minimizes the number of translations done at the NIC, reducing the load on the NIC’s translation cache and resulting in application performance improvement by 1.8x - 2.0x

eScholarship - University of California

Extremely fast (a,b)-trees at all contention levels

Author: Srivastava Anubhav
Publication venue: 'University of Waterloo'
Publication date: 17/08/2021
Field of study

Many concurrent dictionary implementations are designed and evaluated with only low-contention workloads in mind. This thesis presents several concurrent linearizable (a,b)-tree implementations with the overarching goal of performing well on both low- and high-contention workloads, and especially update-heavy workloads. The OCC-ABtree uses optimistic concurrency control to achieve state-of-the-art low-contention performance. However, under high-contention, cache coherence traffic begins to affect its performance. This is addressed by replacing its test-and-compare-and-swap locks with MCS queue locks. The resulting MCS-ABtree scales well under both low- and high-contention workloads. This thesis also introduces two coalescing-based trees, the CoMCS-ABtree and the CoPub-ABtree, that achieve substantially better performance under high-contention by reordering and coalescing concurrent inserts and deletes. Comparing these algorithms against the state of the art in concurrent search trees, we find that the fastest algorithm, the CoPub-ABtree, outperforms the next fastest competitor by up to 2x. This thesis then describes persistent versions of the four trees, whose implementations use fewer sfence instructions than a leading competitor (the FPTree). The persistent trees are proved to be strictly linearizable. Experimentally, the persistent trees are only slightly slower than their volatile counterparts, suggesting that they have great use as in-memory databases that need to be able to recover after a crash

University of Waterloo's Institutional Repository