94 research outputs found
Performance Evaluation and Modeling of HPC I/O on Non-Volatile Memory
HPC applications pose high demands on I/O performance and storage capability.
The emerging non-volatile memory (NVM) techniques offer low-latency, high
bandwidth, and persistence for HPC applications. However, the existing I/O
stack are designed and optimized based on an assumption of disk-based storage.
To effectively use NVM, we must re-examine the existing high performance
computing (HPC) I/O sub-system to properly integrate NVM into it. Using NVM as
a fast storage, the previous assumption on the inferior performance of storage
(e.g., hard drive) is not valid any more. The performance problem caused by
slow storage may be mitigated; the existing mechanisms to narrow the
performance gap between storage and CPU may be unnecessary and result in large
overhead. Thus fully understanding the impact of introducing NVM into the HPC
software stack demands a thorough performance study.
In this paper, we analyze and model the performance of I/O intensive HPC
applications with NVM as a block device. We study the performance from three
perspectives: (1) the impact of NVM on the performance of traditional page
cache; (2) a performance comparison between MPI individual I/O and POSIX I/O;
and (3) the impact of NVM on the performance of collective I/O. We reveal the
diminishing effects of page cache, minor performance difference between MPI
individual I/O and POSIX I/O, and performance disadvantage of collective I/O on
NVM due to unnecessary data shuffling. We also model the performance of MPI
collective I/O and study the complex interaction between data shuffling,
storage performance, and I/O access patterns.Comment: 10 page
Selective Segment Initialization: Exploiting NVRAM to Reduce Device Startup Latency
Abstract-We propose Selective Segment Initialization (SSI) to exploit NVRAM to reduce the device startup latency. SSI locates a kernel binary image in byte-addressable NVRAM and boots the system using this image, eliminating the need to load it from storage. SSI also eliminates the process of decompressing and relocating the OS kernel image in embedded linux system. The key technical ingredients of SSI are precisely identifying the kernel segments where contents are updated in the course of booting and selectively reloading only these sections each time the system reboots. The fresh copy of the sections can be maintained in NVRAM, NAND flash, NOR flash, etc. SSI reduced the size of the kernel binary image loaded from storage into memory by 90% and reduced the overall device startup time by 54%. This approach can be used not only for cold boot (with NVRAM) but also for warm boot, in which the contents of DRAM persist across the system restart
- …