Search CORE

4 research outputs found

Performance Analysis of Scientific and Engineering Applications Using MPInside and TAU

Author: Biswas Rupak
Mehrotra Piyush
Saini Subhash
Shende Sameer Suresh
Taylor Kenichi Jun Haeng
Publication venue
Publication date
Field of study

In this paper, we present performance analysis of two NASA applications using performance tools like Tuning and Analysis Utilities (TAU) and SGI MPInside. MITgcmUV and OVERFLOW are two production-quality applications used extensively by scientists and engineers at NASA. MITgcmUV is a global ocean simulation model, developed by the Estimating the Circulation and Climate of the Ocean (ECCO) Consortium, for solving the fluid equations of motion using the hydrostatic approximation. OVERFLOW is a general-purpose Navier-Stokes solver for computational fluid dynamics (CFD) problems. Using these tools, we analyze the MPI functions (MPI_Sendrecv, MPI_Bcast, MPI_Reduce, MPI_Allreduce, MPI_Barrier, etc.) with respect to message size of each rank, time consumed by each function, and how ranks communicate. MPI communication is further analyzed by studying the performance of MPI functions used in these two applications as a function of message size and number of cores. Finally, we present the compute time, communication time, and I/O time as a function of the number of cores

NASA Technical Reports Server

Methodology and Application of HPC I/O Characterization with MPIProf and IOT

Author: Bauer John
Chang Yan-Tyng Sherry
Jin Henry
Publication venue
Publication date
Field of study

Combining the strengths of MPIProf and IOT, an efficient and systematic method is devised for I/O characterization at the per-job, per-rank, per-file and per-call levels of HPC programs running on the NASA Advanced Supercomputing Center. This method is applied to answer four I/O questions in this paper. A total of 13 MPI programs and 15 cases, ranging from 24 to 5968 ranks, are analyzed to establish the I/O landscape from answers to the four questions. Four of the 13 programs use MPI I/O and the behavior of their collective writes depends on the specific implementation of the MPI library used. The SGI MPT library, the prevailing MPI library for our systems, was found to gather small writes from a large number of ranks to perform larger writes by a small subset of collective buffering ranks. The number of collective buffering ranks invoked by MPT depends on the Lustre stripe count and the number of nodes used for the run. A demonstration of varying the stripe count to achieve double-digit speedup of one program's I/O was presented. Another program, which concurrently opens private files by all ranks and could potentially create a heavy load on the Lustre servers, was identified. The ability to systematically characterize I/O for a large number of programs running on a supercomputer, seek I/O optimization opportunity and identify programs that could cause a high load and instability on the filesystems is important for pursuing exascale in a real production environment

NASA Technical Reports Server

An Application-Based Performance Evaluation of NASAs Nebula Cloud Computing Platform

Author: Biswas Rupak
Chang Johnny
Heistand Steve
Hood Robert T.
Jin Haoqiang
Mehrotra Piyush
Saini Subhash
Publication venue
Publication date: 25/06/2012
Field of study

The high performance computing (HPC) community has shown tremendous interest in exploring cloud computing as it promises high potential. In this paper, we examine the feasibility, performance, and scalability of production quality scientific and engineering applications of interest to NASA on NASA's cloud computing platform, called Nebula, hosted at Ames Research Center. This work represents the comprehensive evaluation of Nebula using NUTTCP, HPCC, NPB, I/O, and MPI function benchmarks as well as four applications representative of the NASA HPC workload. Specifically, we compare Nebula performance on some of these benchmarks and applications to that of NASA s Pleiades supercomputer, a traditional HPC system. We also investigate the impact of virtIO and jumbo frames on interconnect performance. Overall results indicate that on Nebula (i) virtIO and jumbo frames improve network bandwidth by a factor of 5x, (ii) there is a significant virtualization layer overhead of about 10% to 25%, (iii) write performance is lower by a factor of 25x, (iv) latency for short MPI messages is very high, and (v) overall performance is 15% to 48% lower than that on Pleiades for NASA HPC applications. We also comment on the usability of the cloud platform

NASA Technical Reports Server