Search CORE

3 research outputs found

Optimizing parallel I/O performance of HPC applications

Author: Behzad Babak
Publication venue
Publication date: 01/12/2015
Field of study

Parallel I/O is an essential component of modern High Performance Computing (HPC). Obtaining good I/O performance for a broad range of applications on diverse HPC platforms is a major challenge, in part because of complex inter-dependencies between I/O middleware and hardware. The parallel file system and I/O middleware layers all offer optimization parameters that can, in theory, result in better I/O performance. Unfortunately, the right combination of parameters is highly dependent on the application, HPC platform, and problem size/concurrency. Scientific application developers do not have the time or expertise to take on the substantial burden of identifying good parameters for each problem configuration. They resort to using system defaults, a choice that frequently results in poor I/O performance. We expect this problem to be compounded on exascale class machines, which will likely have a deeper software stack with hierarchically arranged hardware resources. We present a line of solution to this problem containing an autotuning system for optimizing I/O performance, I/O performance modeling, I/O tuning, I/O kernel generation, and I/O patterns. We demonstrate the value of these solution across platforms, applications, and at scale

Illinois Digital Environment for Access to Learning and Scholarship Repository

Optimizing I/O performance for high performance computing applications: from auto-tuning to a feedback-driven approach

Author: Luu Huong Vu Thanh
Publication venue
Publication date
Field of study

The 2014 TOP500 supercomputer list includes over 40 deployed petascale systems, and the high performance computing (HPC) community is working toward developing the first exaflop system by 2023. Scientific applications on such large-scale computers often read and write a lot of data. With such rapid growth in computing power and data intensity, I/O continues to be a challenging factor in determining the overall performance of HPC applications. We address the problem of optimizing I/O performance for HPC applications by firstly examining the I/O behavior of thousands of supercomputing applications. We analyzed the high-level I/O logs of over a million jobs representing a combined total of six years of I/O behavior across three leading high-performance computing platforms. Our analysis provides a broad portrait of the state of HPC I/O usage. We proposed a simple and effective analysis and visualization procedure to help scientists who do not have I/O expertise to quickly locate the bottlenecks and inefficiencies in their I/O approach. We proposed several filtering criteria for system administrators to find application candidates that are consuming system I/O resources inefficiently. Overall, our analysis techniques can help both application users and platform administrators improve I/O performance and I/O system utilization. In the second part, we develop a framework that can hide the complexity of the I/O stack from scientists without penalizing performance. This framework will allow application developers to issue I/O calls without modification and rely on an intelligent runtime system to transparently determine and execute an I/O strategy that takes all the levels of the I/O stack into account. Lastly, we develop a multi-level tracing framework that provides a much more detailed feedback for application’s I/O runtime behavior. These details are needed for in-depth application’s performance analysis and tuning

Illinois Digital Environment for Access to Learning and Scholarship Repository

Performance Modeling for the Panda Array I/O Library

Author: K. E. Seamons
M. Subramaniam
M. Winslett
S. Kuo
Y. Chen
Y. Cho
Publication venue: Society Press
Publication date: 01/01/1996
Field of study

We present an analytical performance model for Panda, a library for synchronized i/o of large multidimensional arrays on parallel and sequential platforms, and show how the Panda developers use this model to evaluate Panda's parallel i/o performance and guide future Panda development. The model validation shows that system developers can simplify performance analysis, identify potential performance bottlenecks, and study the design trade-offs for Panda on massively parallel platforms more easily than by conducting empirical experiments. More importantly, we show that the outputs of the performance model can be used to help make optimal plans for handling application i/o requests, the first step toward our long-term goal of automatically optimizing i/o request handling in Panda. This research was supported by an ARPA Fellowship in High Performance Computing administered by the Institute for Advanced Computer Studies, University of Maryland, by NSF under PYI grant IRI 89 58582, and by N..

CiteSeerX

Crossref