70 research outputs found

    07341 Abstracts Collection -- Code Instrumentation and Modeling for Parallel Performance Analysis

    Get PDF
    From 20th to 24th August 2007, the Dagstuhl Seminar 07341 ``Code Instrumentation and Modeling for Parallel Performance Analysis\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

    Providing Insight into the Performance of Distributed Applications Through Low-Level Metrics

    Get PDF
    The field of high-performance computing (HPC) has always dealt with the bleeding edge of computational hardware and software to achieve the maximum possible performance for a wide variety of workloads. When dealing with brand new technologies, it can be difficult to understand how these technologies work and why they work the way they do. One of the more prevalent approaches to providing insight into modern hardware and software is to provide tools that allow developers to access low-level metrics about their performance. The modern HPC ecosystem supports a wide array of technologies, but in this work, I will be focusing on two particularly influential technologies: The Message Passing Interface (MPI), and Graphical Processing Units (GPUs).For many years, MPI has been the dominant programming paradigm in HPC. Indeed, over 90% of applications that are a part of the U.S. Exascale Computing Project plan to use MPI in some fashion. The MPI Standard provides programmers with a wide variety of methods to communicate between processes, along with several other capabilities. The high-level MPI Profiling Interface has been the primary method for profiling MPI applications since the inception of the MPI Standard, and more recently the low-level MPI Tool Information Interface was introduced.Accelerators like GPUs have been increasingly adopted as the primary computational workhorse for modern supercomputers. GPUs provide more parallelism than traditional CPUs through a hierarchical grid of lightweight processing cores. NVIDIA provides profiling tools for their GPUs that give access to low-level hardware metrics.In this work, I propose research in applying low-level metrics to both the MPI and GPU paradigms in the form of an implementation of low-level metrics for MPI, and a new method for analyzing GPU load imbalance with a synthetic efficiency metric. I introduce Software-based Performance Counters (SPCs) to expose internal metrics of the Open MPI implementation along with a new interface for exposing these counters to users and tool developers. I also analyze a modified load imbalance formula for GPU-based applications that uses low-level hardware metrics provided through nvprof in a hierarchical approach to take the internal load imbalance of the GPU into account

    GALPROP WebRun: an internet-based service for calculating galactic cosmic ray propagation and associated photon emissions

    Full text link
    GALPROP is a numerical code for calculating the galactic propagation of relativistic charged particles and the diffuse emissions produced during their propagation. The code incorporates as much realistic astrophysical input as possible together with latest theoretical developments and has become a de facto standard in astrophysics of cosmic rays. We present GALPROP WebRun, a service to the scientific community enabling easy use of the freely available GALPROP code via web browsers. In addition, we introduce the latest GALPROP version 54, available through this service.Comment: Accepted for publication in Computer Physics Communications. Version 2 includes improvements suggested by the referee. Metadata completed in version 3 (no changes to the manuscript

    Improving MPI Application Communication Time with an Introspection Monitoring Library

    Get PDF
    As IPDPS in-person meeting was cancelled, PDSEC will be onlineInternational audienceIn this paper we describe how to improve communication time of MPI parallel applications with the use of a library that enables to monitor MPI applications and allows for introspection (the program itself can query the state of the monitoring system). Based on previous work, this library is able to see how collective communications are decomposed into point-to-point messages. It also features monitoring sessions that allow suspending and restarting the monitoring, limiting it to specific portions of the code. Experiments show that the monitoring overhead is very small and that the proposed features allow for dynamic and efficient rank reordering enabling up to 2-time reduction of communication parts of some program

    Improving MPI Application Communication Time with an Introspection Monitoring Library

    Get PDF
    In this report we describe how to improve communication time of MPI parallel applications with the use of a library that enables to monitor MPI applications and allows for introspection (the program itself can query the state of the monitoring system). Based on previous work, this library is able to see how collective communications are decomposed into point-to-point messages. It also features monitoring sessions that allow suspending and restarting the monitoring, limiting it to specific portions of the code. Experiments show that the monitoring overhead is very small and that the proposed features allow for dynamic and efficient rank reordering enabling up to 2-time reduction of communication parts of some program.Dans ce rapport, nous décrivons comment améliorer le temps de communication d’applications parallèles écrites en MPI. Pour cela, nous proposons, une bibliothèque qui effectue du contrôle (monitoring) introspectif des applications MPI : le programme peut lui-même interroger le système de contrôle/monitoring). Cette bibliothèque se base sur des travaux précédents qui permettent de voir comment les communications collectives sont décomposées en messages point-à-point. Cette bibliothèque présente aussi des sessions de monitoring pour suspendre et de redémarrer le contrôle permettant de limiter celui-ci à une portion précise du code. Les expériences montrent que le surcout est très faible et que ses caractéristiques permettent une réorganisation dynamique et efficace des rangs résultant à une réduction de moitié du temps de communication de certaines parties du programm

    Surveillance dynamique des communications MPI au cours de l’exécution : guide d’utilisation scientifique et documentation technique

    Get PDF
    Understanding application communication patterns became increasingly relevant as the complexity and diversity of the underlying hardware along with elaborate network topologies are making the implementation of portable and efficient algorithms more challenging. Equipped with the knowledge of the communication patterns, external tools can predict and improve the performance of applications either by modifying the process placement or by changing the communication infrastructure parameters to refine the match between the application requirementsand the message passing library capabilities. This report presents the design and evaluation of a communication monitoring infrastructure developed in the Open MPI software stack and able to expose a dynamically configurable level of detail about the application communication patterns, accompanied by a user documentation and a technical report about the implementation details.La diversité ainsi que la complexité des supports de communications couplées à la complexité des topologies résiliennes rendent l’implémentation d’algorithmes portables et efficaces de plus en plus difficile. Il en est devenu particulièrement pertinent d’être capable d’appréhender les modèles de communication des applications. Des outils extérieurs à ces applications peuvent ainsi prévoir et en améliorer les performances, à l’aide de la connaissance de ces modèles, soit en modifiant le placement des processus, soit en changeant les paramètres des infrastructures de communication afin d’affiner la correspondance entre les besoins de ces applications et les possibilités offertes par la bibliothèque de passage de messages. Ce rapport présente la conception et l’évaluation d’une infrastructure de surveillance des communications développée au sein de la pile logicielle Open MPI. Celle-ci exporte divers niveaux de détails des modèles de communication et est configurable dynamiquement. Ce rapport comprend également un guide d’utilisateur ainsi qu’une documentation technique décrivant les détails d’implémentation

    Towards Portable Online Prediction of Network Utilization using MPI-level Monitoring

    Get PDF
    International audienceStealing network bandwidth helps a variety of HPC runtimes and services to run additional operations in the background without negatively affecting the applications. A key ingredient to make this possible is an accurate prediction of the future network utilization, enabling the runtime to plan the background operations in advance, such as to avoid competing with the application for network bandwidth. In this paper, we propose a portable deep learning predictor that only uses the information available through MPI introspection to construct a recurrent sequence-to-sequence neural network capable of forecasting network utilization. We leverage the fact that most HPC applications exhibit periodic behaviors to enable predictions far into the future (at least the length of a period). Our on-line approach does not have an initial training phase, it continuously improves itself during application execution without incurring significant computational overhead. Experimental results show better accuracy and lower computational overhead compared with the state-of-the-art on two representative applications
    • …
    corecore