47 research outputs found

    Electra: A Modular-Based Expansion of NASA's Supercomputing Capability

    Get PDF
    NASA has increasingly relied on high-performance computing (HPC) re- sources for computational modeling, simulation, and data analysis to meet the science and engineering goals of its missions in space exploration, aeronautics, and Earth and space science. The NASA Advanced Supercomputing (NAS) Division at Ames Research Center in Silicon Valley, Calif., hosts NASAs premier supercomputing resources, integral to achieving and enhancing the success of the agencys missions. NAS provides a balanced environment, funded under the High-End Computing Capability (HECC) project, comprised of world-class supercomputers, including its flagship distributed-memory cluster, Pleiades; high-speed networking; and massive data storage facilities, along with multi-disciplinary support teams for user support, code porting and optimization, and large-scale data analysis and scientific visualization. However, as scientists have increased the fidelity of their simulations and engineers are conducting larger parameter-space studies, the requirements for supercomputing resources have been growing by leaps and bounds. With the facility housing the HECC systems reaching its power and cooling capacity, NAS undertook a prototype project to investigate an alternative approach for housing supercomputers. Modular supercomputing, or container-based computing, is an innovative concept for expanding NASAs HPC capabilities. With modular supercomputing, additional containerssimilar to portable storage podscan be connected together as needed to accommodate the agencys ever-increasing demand for computing resources. In addition, taking advantage of the local weather permits the use of cooling technologies that would additionally save energy and reduce annual water usage. The first stage of NASAs Modular Supercomputing Facility (MSF) prototype, which resulted in a 1,000 square-foot module on a concrete pad with room for 16 compute racks, was completed in Fall 2016 and an SGI (now HPE) computer system, named Electra, was deployed there in early 2017. Cooling is performed via an evaporative system built into the module, and preliminary experience shows a Power Usage Effectiveness (PUE) measurement of 1.03. Electra achieved over a petaflop on the LINPACK benchmark, sufficient to rank number 96 on the November 2016 TOP500 list [14]. The system consists of 1,152 InfiniBand-connected Intel Xeon Broadwell-based nodes. Its users access their files on a facility-wide file system shared by all HECC compute assets via Mellanox MetroX InfiniBand extenders, which connect the Electra fabric to Lustre routers in the primary facility over fiber-optic links about 900 feet long. The MSF prototype has exceeded expectations and is serving as a blueprint for future expansions. In the remainder of this chapter, we detail how modular data center technology can be used to expand an existing compute resource. We begin by describing NASAs requirements for supercomputing and how resources were provided prior to the integration of the Electra module-based system

    Performance of the Widely-Used CFD Code OVERFLOW on the Pleides Supercomputer

    Get PDF
    Computational performance studies were made for NASA's widely used Computational Fluid Dynamics code OVERFLOW on the Pleiades Supercomputer. Two test cases were considered: a full launch vehicle with a grid of 286 million points and a full rotorcraft model with a grid of 614 million points. Computations using up to 8000 cores were run on Sandy Bridge and Ivy Bridge nodes. Performance was monitored using times reported in the day files from the Portable Batch System utility. Results for two grid topologies are presented and compared in detail. Observations and suggestions for future work are made

    Evaluating the Suitability of Commercial Clouds for NASA's High Performance Computing Applications: A Trade Study

    Get PDF
    NASAs High-End Computing Capability (HECC) Project is periodically asked if it could be more cost effective through the use of commercial cloud resources. To answer the question, HECCs Application Performance and Productivity (APP) team undertook a performance and cost evaluation comparing three domains: two commercial cloud providers, Amazon and Penguin, and HECCs in-house resourcesthe Pleiades and Electra systems. In the study, the APP team used a combination of the NAS Parallel Benchmarks (NPB) and six full applications from NASAs workload on Pleiades and Electra to compare performance of nodes based on three different generations of Intel Xeon processorsHaswell, Broadwell, and Skylake. Because of export control limitations, the most heavily used applications on Pleiades and Electra could not be used in the cloud; therefore, only one of the applications, OpenFOAM, represents work from the Aeronautics Research Mission Directorate and the Human and Exploration Mission Directorate. The other five applications are from the Science Mission Directorate

    I/O Performance Characterization of Lustre and NASA Applications on Pleiades

    Get PDF
    In this paper we study the performance of the Lustre file system using five scientific and engineering applications representative of NASA workload on large-scale supercomputing systems such as NASA s Pleiades. In order to facilitate the collection of Lustre performance metrics, we have developed a software tool that exports a wide variety of client and server-side metrics using SGI's Performance Co-Pilot (PCP), and generates a human readable report on key metrics at the end of a batch job. These performance metrics are (a) amount of data read and written, (b) number of files opened and closed, and (c) remote procedure call (RPC) size distribution (4 KB to 1024 KB, in powers of 2) for I/O operations. RPC size distribution measures the efficiency of the Lustre client and can pinpoint problems such as small write sizes, disk fragmentation, etc. These extracted statistics are useful in determining the I/O pattern of the application and can assist in identifying possible improvements for users applications. Information on the number of file operations enables a scientist to optimize the I/O performance of their applications. Amount of I/O data helps users choose the optimal stripe size and stripe count to enhance I/O performance. In this paper, we demonstrate the usefulness of this tool on Pleiades for five production quality NASA scientific and engineering applications. We compare the latency of read and write operations under Lustre to that with NFS by tracing system calls and signals. We also investigate the read and write policies and study the effect of page cache size on I/O operations. We examine the performance impact of Lustre stripe size and stripe count along with performance evaluation of file per process and single shared file accessed by all the processes for NASA workload using parameterized IOR benchmark

    Abstracts to Be Presented at the 2015 Supercomputing Conference

    Get PDF
    Compilation of Abstracts to be presented at the 2015 Supercomputing Conferenc

    NASA Advanced Computing Environment for Science and Engineering

    Get PDF
    Vision: To reach for new heights and reveal the unknown so that what we do and learn will benefit all humankind. Mission: To pioneer the future in space exploration, scientific discovery, and aeronautics research. Aeronautics Research (ARMD): Pioneer and prove new flight technologies for safer, more secure, efficient, and environmental friendly air transportation. Human Exploration and Operations (HEOMD): Focus on ISS operations; and develop new spacecraft and other capabilities for affordable, sustainable exploration beyond low Earth orbit. Science (SCMD): Explore the Earth, solar system, and universe beyond; chart best route for discovery; and reap the benefits of Earth and space exploration for society. Space Technology (STMD): Rapidly develop, demonstrate, and infuse revolutionary, high-payoff technologies through collaborative partnerships, expanding the boundaries of aerospace enterprise

    NASA Advanced Computing Environment for Science and Engineering

    Get PDF
    An overview regarding NASA Ames Advanced Computing Systems. This presentation will discuss the modeling, simulation, analysis, and decision-making in relation to the Advanced Computing Systems

    NASA Downscaling Project

    Get PDF
    A team of researchers from NASA Ames Research Center, Goddard Space Flight Center, the Jet Propulsion Laboratory, and Marshall Space Flight Center, along with university partners at UCLA, conducted an investigation to explore whether downscaling coarse resolution global climate model (GCM) predictions might provide valid insights into the regional impacts sought by decision makers. Since the computational cost of running global models at high spatial resolution for any useful climate scale period is prohibitive, the hope for downscaling is that a coarse resolution GCM provides sufficiently accurate synoptic scale information for a regional climate model (RCM) to accurately develop fine scale features that represent the regional impacts of a changing climate. As a proxy for a prognostic climate forecast model, and so that ground truth in the form of satellite and in-situ observations could be used for evaluation, the MERRA and MERRA-2 reanalyses were used to drive the NU-WRF regional climate model and a GEOS-5 replay. This was performed at various resolutions that were at factors of 2 to 10 higher than the reanalysis forcing. A number of experiments were conducted that varied resolution, model parameterizations, and intermediate scale nudging, for simulations over the continental US during the period from 2000-2010. The results of these experiments were compared to observational datasets to evaluate the output

    Energy-aware performance engineering in high performance computing

    Get PDF
    Advances in processor design have delivered performance improvements for decades. As physical limits are reached, however, refinements to the same basic technologies are beginning to yield diminishing returns. Unsustainable increases in energy consumption are forcing hardware manufacturers to prioritise energy efficiency in their designs. Research suggests that software modifications will be needed to exploit the resulting improvements in current and future hardware. New tools are required to capitalise on this new class of optimisation. This thesis investigates the field of energy-aware performance engineering. It begins by examining the current state of the art, which is characterised by ad-hoc techniques and a lack of standardised metrics. Work in this thesis addresses these deficiencies and lays stable foundations for others to build on. The first contribution made includes a set of criteria which define the properties that energy-aware optimisation metrics should exhibit. These criteria show that current metrics cannot meaningfully assess the utility of code or correctly guide its optimisation. New metrics are proposed to address these issues, and theoretical and empirical proofs of their advantages are given. This thesis then presents the Power Optimised Software Envelope (POSE) model, which allows developers to assess whether power optimisation is worth pursuing for their applications. POSE is used to study the optimisation characteristics of codes from the Mantevo mini-application suite running on a Haswell-based cluster. The results obtained show that of these codes TeaLeaf has the most scope for power optimisation while PathFinder has the least. Finally, POSE modelling techniques are extended to evaluate the system-wide scope for energy-aware performance optimisation. System Summary POSE allows developers to assess the scope a system has for energy-aware software optimisation independent of the code being run
    corecore