3 research outputs found

    Optimizing parallel I/O performance of HPC applications

    Get PDF
    Parallel I/O is an essential component of modern High Performance Computing (HPC). Obtaining good I/O performance for a broad range of applications on diverse HPC platforms is a major challenge, in part because of complex inter-dependencies between I/O middleware and hardware. The parallel file system and I/O middleware layers all offer optimization parameters that can, in theory, result in better I/O performance. Unfortunately, the right combination of parameters is highly dependent on the application, HPC platform, and problem size/concurrency. Scientific application developers do not have the time or expertise to take on the substantial burden of identifying good parameters for each problem configuration. They resort to using system defaults, a choice that frequently results in poor I/O performance. We expect this problem to be compounded on exascale class machines, which will likely have a deeper software stack with hierarchically arranged hardware resources. We present a line of solution to this problem containing an autotuning system for optimizing I/O performance, I/O performance modeling, I/O tuning, I/O kernel generation, and I/O patterns. We demonstrate the value of these solution across platforms, applications, and at scale

    Optimizing I/O performance for high performance computing applications: from auto-tuning to a feedback-driven approach

    Get PDF
    The 2014 TOP500 supercomputer list includes over 40 deployed petascale systems, and the high performance computing (HPC) community is working toward developing the first exaflop system by 2023. Scientific applications on such large-scale computers often read and write a lot of data. With such rapid growth in computing power and data intensity, I/O continues to be a challenging factor in determining the overall performance of HPC applications. We address the problem of optimizing I/O performance for HPC applications by firstly examining the I/O behavior of thousands of supercomputing applications. We analyzed the high-level I/O logs of over a million jobs representing a combined total of six years of I/O behavior across three leading high-performance computing platforms. Our analysis provides a broad portrait of the state of HPC I/O usage. We proposed a simple and effective analysis and visualization procedure to help scientists who do not have I/O expertise to quickly locate the bottlenecks and inefficiencies in their I/O approach. We proposed several filtering criteria for system administrators to find application candidates that are consuming system I/O resources inefficiently. Overall, our analysis techniques can help both application users and platform administrators improve I/O performance and I/O system utilization. In the second part, we develop a framework that can hide the complexity of the I/O stack from scientists without penalizing performance. This framework will allow application developers to issue I/O calls without modification and rely on an intelligent runtime system to transparently determine and execute an I/O strategy that takes all the levels of the I/O stack into account. Lastly, we develop a multi-level tracing framework that provides a much more detailed feedback for application’s I/O runtime behavior. These details are needed for in-depth application’s performance analysis and tuning

    Performance Modeling for the Panda Array I/O Library

    No full text
    We present an analytical performance model for Panda, a library for synchronized i/o of large multidimensional arrays on parallel and sequential platforms, and show how the Panda developers use this model to evaluate Panda's parallel i/o performance and guide future Panda development. The model validation shows that system developers can simplify performance analysis, identify potential performance bottlenecks, and study the design trade-offs for Panda on massively parallel platforms more easily than by conducting empirical experiments. More importantly, we show that the outputs of the performance model can be used to help make optimal plans for handling application i/o requests, the first step toward our long-term goal of automatically optimizing i/o request handling in Panda. This research was supported by an ARPA Fellowship in High Performance Computing administered by the Institute for Advanced Computer Studies, University of Maryland, by NSF under PYI grant IRI 89 58582, and by N..
    corecore