31 research outputs found

    05501 Abstracts Collection -- Automatic Performance Analysis

    Get PDF
    From 12.12.05 to 16.12.05, the Dagstuhl Seminar 05501 ``Automatic Performance Analysis\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

    Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles

    Full text link
    Abstract—Applications must scale well to make efficient use of today’s class of petascale computers, which contain hundreds of thousands of processor cores. Inefficiencies that do not even appear in modest-scale executions can become major bottlenecks in large-scale executions. Because scaling problems are often difficult to diagnose, there is a critical need for scalable tools that guide scientists to the root causes of scaling problems. Load imbalance is one of the most common scaling problems. To provide actionable insight into load imbalance, we present post-mortem parallel analysis techniques for pinpointing and quantifying load imbalance in the context of call path profiles of parallel programs. We show how to identify load imbalance in its static and dynamic context by using only low-overhead asyn-chronous call path profiling to locate regions of code responsible for communication wait time in SPMD executions. We describe the implementation of these techniques within HPCTOOLKIT. I

    ScalAna: Automating Scaling Loss Detection with Graph Analysis

    Full text link
    Scaling a parallel program to modern supercomputers is challenging due to inter-process communication, Amdahl's law, and resource contention. Performance analysis tools for finding such scaling bottlenecks either base on profiling or tracing. Profiling incurs low overheads but does not capture detailed dependencies needed for root-cause analysis. Tracing collects all information at prohibitive overheads. In this work, we design ScalAna that uses static analysis techniques to achieve the best of both worlds - it enables the analyzability of traces at a cost similar to profiling. ScalAna first leverages static compiler techniques to build a Program Structure Graph, which records the main computation and communication patterns as well as the program's control structures. At runtime, we adopt lightweight techniques to collect performance data according to the graph structure and generate a Program Performance Graph. With this graph, we propose a novel approach, called backtracking root cause detection, which can automatically and efficiently detect the root cause of scaling loss. We evaluate ScalAna with real applications. Results show that our approach can effectively locate the root cause of scaling loss for real applications and incurs 1.73% overhead on average for up to 2,048 processes. We achieve up to 11.11% performance improvement by fixing the root causes detected by ScalAna on 2,048 processes.Comment: conferenc

    Performance analysis for parallel programs from multicore to petascale

    Get PDF
    Cutting-edge science and engineering applications require petascale computing. Petascale computing platforms are characterized by both extreme parallelism (systems of hundreds of thousands to millions of cores) and hybrid parallelism (nodes with multicore chips). Consequently, to effectively use petascale resources, applications must exploit concurrency at both the node and system level --- a difficult problem. The challenge of developing scalable petascale applications is only partially aided by existing languages and compilers. As a result, manual performance tuning is often necessary to identify and resolve poor parallel and serial efficiency. Our thesis is that it is possible to achieve unique, accurate, and actionable insight into the performance of fully optimized parallel programs by measuring them with asynchronous-sampling-based call path profiles; attributing the resulting binary-level measurements to source code structure; analyzing measurements on-the-fly and postmortem to highlight performance inefficiencies; and presenting the resulting context- sensitive metrics in three complementary views. To support this thesis, we have developed several techniques for identifying performance problems in fully optimized serial, multithreaded and petascale programs. First, we describe how to attribute very precise (instruction-level) measurements to source-level static and dynamic contexts in fully optimized applications --- all for an average run-time overhead of a few percent. We then generalize this work with the development of logical call path profiling and apply it to work-stealing-based applications. Second, we describe techniques for pinpointing and quantifying parallel inefficiencies such as parallel idleness, parallel overhead and lock contention in multithreaded executions. Third, we show how to diagnose scalability bottlenecks in petascale applications by scaling our our measurement, analysis and presentation tools to support large-scale executions. Finally, we provide a coherent framework for these techniques by sketching a unique and comprehensive performance analysis methodology. This work forms the basis of Rice University's HPCTOOLKIT performance tools

    Advanced Simulation and Computing FY09-FY10 Implementation Plan Volume 2, Rev. 1

    Full text link

    Infrastructure Plan for ASC Petascale Environments

    Full text link

    Compilation of Abstracts for SC12 Conference Proceedings

    Get PDF
    1 A Breakthrough in Rotorcraft Prediction Accuracy Using Detached Eddy Simulation; 2 Adjoint-Based Design for Complex Aerospace Configurations; 3 Simulating Hypersonic Turbulent Combustion for Future Aircraft; 4 From a Roar to a Whisper: Making Modern Aircraft Quieter; 5 Modeling of Extended Formation Flight on High-Performance Computers; 6 Supersonic Retropropulsion for Mars Entry; 7 Validating Water Spray Simulation Models for the SLS Launch Environment; 8 Simulating Moving Valves for Space Launch System Liquid Engines; 9 Innovative Simulations for Modeling the SLS Solid Rocket Booster Ignition; 10 Solid Rocket Booster Ignition Overpressure Simulations for the Space Launch System; 11 CFD Simulations to Support the Next Generation of Launch Pads; 12 Modeling and Simulation Support for NASA's Next-Generation Space Launch System; 13 Simulating Planetary Entry Environments for Space Exploration Vehicles; 14 NASA Center for Climate Simulation Highlights; 15 Ultrascale Climate Data Visualization and Analysis; 16 NASA Climate Simulations and Observations for the IPCC and Beyond; 17 Next-Generation Climate Data Services: MERRA Analytics; 18 Recent Advances in High-Resolution Global Atmospheric Modeling; 19 Causes and Consequences of Turbulence in the Earths Protective Shield; 20 NASA Earth Exchange (NEX): A Collaborative Supercomputing Platform; 21 Powering Deep Space Missions: Thermoelectric Properties of Complex Materials; 22 Meeting NASA's High-End Computing Goals Through Innovation; 23 Continuous Enhancements to the Pleiades Supercomputer for Maximum Uptime; 24 Live Demonstrations of 100-Gbps File Transfers Across LANs and WANs; 25 Untangling the Computing Landscape for Climate Simulations; 26 Simulating Galaxies and the Universe; 27 The Mysterious Origin of Stellar Masses; 28 Hot-Plasma Geysers on the Sun; 29 Turbulent Life of Kepler Stars; 30 Modeling Weather on the Sun; 31 Weather on Mars: The Meteorology of Gale Crater; 32 Enhancing Performance of NASAs High-End Computing Applications; 33 Designing Curiosity's Perfect Landing on Mars; 34 The Search Continues: Kepler's Quest for Habitable Earth-Sized Planets

    Acceleration and Verification of Virtual High-throughput Multiconformer Docking

    Get PDF
    The work in this dissertation explores the use of massive computational power available through modern supercomputers as a virtual laboratory to aid drug discovery. As of November 2013, Tianhe-2, the fastest supercomputer in the world, has a theoretical performance peak of 54,902 TFlop/s or nearly 55 thousand trillion calculations per second. The Titan supercomputer located at Oak Ridge National Laboratory has 560,640 computing cores that can work in parallel to solve scientific problems. In order to harness this computational power to assist in drug discovery, tools are developed to aid in the preparation and analysis of high-throughput virtual docking screens, a tool to predict how and how well small molecules bind to disease associated proteins and potentially serve as a novel drug candidate. Methods and software for performing large screens are developed that run on high-performance computer systems. The future potential and benefits of using these tools to study polypharmacology and revolutionizing the pharmaceutical industry are also discussed

    National Computational Infrastructure for Lattice Gauge Theory SciDAC-2 Closeout Report

    Full text link
    corecore