94 research outputs found

    Performance Evaluation and Modeling of HPC I/O on Non-Volatile Memory

    Full text link
    HPC applications pose high demands on I/O performance and storage capability. The emerging non-volatile memory (NVM) techniques offer low-latency, high bandwidth, and persistence for HPC applications. However, the existing I/O stack are designed and optimized based on an assumption of disk-based storage. To effectively use NVM, we must re-examine the existing high performance computing (HPC) I/O sub-system to properly integrate NVM into it. Using NVM as a fast storage, the previous assumption on the inferior performance of storage (e.g., hard drive) is not valid any more. The performance problem caused by slow storage may be mitigated; the existing mechanisms to narrow the performance gap between storage and CPU may be unnecessary and result in large overhead. Thus fully understanding the impact of introducing NVM into the HPC software stack demands a thorough performance study. In this paper, we analyze and model the performance of I/O intensive HPC applications with NVM as a block device. We study the performance from three perspectives: (1) the impact of NVM on the performance of traditional page cache; (2) a performance comparison between MPI individual I/O and POSIX I/O; and (3) the impact of NVM on the performance of collective I/O. We reveal the diminishing effects of page cache, minor performance difference between MPI individual I/O and POSIX I/O, and performance disadvantage of collective I/O on NVM due to unnecessary data shuffling. We also model the performance of MPI collective I/O and study the complex interaction between data shuffling, storage performance, and I/O access patterns.Comment: 10 page

    Selective Segment Initialization: Exploiting NVRAM to Reduce Device Startup Latency

    Get PDF
    Abstract-We propose Selective Segment Initialization (SSI) to exploit NVRAM to reduce the device startup latency. SSI locates a kernel binary image in byte-addressable NVRAM and boots the system using this image, eliminating the need to load it from storage. SSI also eliminates the process of decompressing and relocating the OS kernel image in embedded linux system. The key technical ingredients of SSI are precisely identifying the kernel segments where contents are updated in the course of booting and selectively reloading only these sections each time the system reboots. The fresh copy of the sections can be maintained in NVRAM, NAND flash, NOR flash, etc. SSI reduced the size of the kernel binary image loaded from storage into memory by 90% and reduced the overall device startup time by 54%. This approach can be used not only for cold boot (with NVRAM) but also for warm boot, in which the contents of DRAM persist across the system restart
    • …
    corecore