66 research outputs found

    Towards a Mini-App for Smoothed Particle Hydrodynamics at Exascale

    Full text link
    The smoothed particle hydrodynamics (SPH) technique is a purely Lagrangian method, used in numerical simulations of fluids in astrophysics and computational fluid dynamics, among many other fields. SPH simulations with detailed physics represent computationally-demanding calculations. The parallelization of SPH codes is not trivial due to the absence of a structured grid. Additionally, the performance of the SPH codes can be, in general, adversely impacted by several factors, such as multiple time-stepping, long-range interactions, and/or boundary conditions. This work presents insights into the current performance and functionalities of three SPH codes: SPHYNX, ChaNGa, and SPH-flow. These codes are the starting point of an interdisciplinary co-design project, SPH-EXA, for the development of an Exascale-ready SPH mini-app. To gain such insights, a rotating square patch test was implemented as a common test simulation for the three SPH codes and analyzed on two modern HPC systems. Furthermore, to stress the differences with the codes stemming from the astrophysics community (SPHYNX and ChaNGa), an additional test case, the Evrard collapse, has also been carried out. This work extrapolates the common basic SPH features in the three codes for the purpose of consolidating them into a pure-SPH, Exascale-ready, optimized, mini-app. Moreover, the outcome of this serves as direct feedback to the parent codes, to improve their performance and overall scalability.Comment: 18 pages, 4 figures, 5 tables, 2018 IEEE International Conference on Cluster Computing proceedings for WRAp1

    SPH-EXA: Enhancing the Scalability of SPH codes Via an Exascale-Ready SPH Mini-App

    Full text link
    Numerical simulations of fluids in astrophysics and computational fluid dynamics (CFD) are among the most computationally-demanding calculations, in terms of sustained floating-point operations per second, or FLOP/s. It is expected that these numerical simulations will significantly benefit from the future Exascale computing infrastructures, that will perform 10^18 FLOP/s. The performance of the SPH codes is, in general, adversely impacted by several factors, such as multiple time-stepping, long-range interactions, and/or boundary conditions. In this work an extensive study of three SPH implementations SPHYNX, ChaNGa, and XXX is performed, to gain insights and to expose any limitations and characteristics of the codes. These codes are the starting point of an interdisciplinary co-design project, SPH-EXA, for the development of an Exascale-ready SPH mini-app. We implemented a rotating square patch as a joint test simulation for the three SPH codes and analyzed their performance on a modern HPC system, Piz Daint. The performance profiling and scalability analysis conducted on the three parent codes allowed to expose their performance issues, such as load imbalance, both in MPI and OpenMP. Two-level load balancing has been successfully applied to SPHYNX to overcome its load imbalance. The performance analysis shapes and drives the design of the SPH-EXA mini-app towards the use of efficient parallelization methods, fault-tolerance mechanisms, and load balancing approaches.Comment: arXiv admin note: substantial text overlap with arXiv:1809.0801

    Optimization of a discontinuous Galerkin solver with OpenCL and StarPU

    Get PDF
    International audienceSince the recent advance in microprocessor design, the optimization of computing software becomes more and more technical. One of the difficulties is to transform sequential algorithms into parallel ones. A possible solution is the task-based design. In this approach, it is possible to describe the parallelization possibilities of the algorithm automatically. The task-based design is also a good strategy to optimize software in an incremental way. The objective of this paper is to describe a practical experience of a task-based parallelization of a Discontinuous Galerkin method in the context of electromagnetic simulations. The task-based description is managed by the StarPU runtime. Additional acceleration is obtained by OpenCL

    Optimisation d'un algorithme Galerkin Discontinu en OpenCL appliqué à la simulation en électromagnétisme

    Get PDF
    International audienceIn this paper, we present GPU and CPU optimization results of a Discontinuous Galerkin algorithm applied to electromagnetism and implemented in OpenCL and MPI. This algorithm was initially optimized to run in parallel on several GPUs and then adapted for CPUs. GPUs and CPUs require an specific implementation adapted to their hardware architectures. We begin by describing the field of application. Then, we present the GPU optimizations as well as the performances obtained on GPU and CPU with this version of the code. Finally, we describe the adaptations made for the OpenCL CPU optimizations.Dans cet article, nous prĂ©sentons les rĂ©sultats d'optimisation sur GPU et CPU d'un algorithme Galerkin Discontinu appliquĂ© Ă  l'Ă©lectromagnĂ©tisme et codĂ© en OpenCL et MPI. Cet algorithme a initialement Ă©tĂ© optimisĂ© pour ĂȘtre exĂ©cutĂ© en parallĂšle sur plusieurs GPUs et ensuite adaptĂ© pour CPUs. Les GPUs et CPUs nĂ©cessitent une implĂ©mentation propre Ă  leur architecture matĂ©rielle. Nous commençons par prĂ©ciser le contexte d'application. Dans un second temps, nous prĂ©sentons les optimisations GPU ainsi que les performances obtenues sur GPU et CPU avec cette version du code. Enfin, nous dĂ©crivons les adaptations qui ont permis de dĂ©cupler les performances sur CPU

    Resiliency in numerical algorithm design for extreme scale simulations

    Get PDF
    This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale Simulations’ held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of involving an enormous amount of resources and costs. A typical large-scale computation running for 48 h on a system consuming 20 MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 1023 floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large-scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is currently wide open. One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. These ideas constituted an essential topic of the seminar. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. This article gathers a broad range of perspectives on the role of algorithms, applications and systems in achieving resilience for extreme scale simulations. The ultimate goal is to spark novel ideas and encourage the development of concrete solutions for achieving such resilience holistically.Peer Reviewed"Article signat per 36 autors/es: Emmanuel Agullo, Mirco Altenbernd, Hartwig Anzt, Leonardo Bautista-Gomez, Tommaso Benacchio, Luca Bonaventura, Hans-Joachim Bungartz, Sanjay Chatterjee, Florina M. Ciorba, Nathan DeBardeleben, Daniel Drzisga, Sebastian Eibl, Christian Engelmann, Wilfried N. Gansterer, Luc Giraud, Dominik G ̈oddeke, Marco Heisig, Fabienne Jezequel, Nils Kohl, Xiaoye Sherry Li, Romain Lion, Miriam Mehl, Paul Mycek, Michael Obersteiner, Enrique S. Quintana-Ortiz, Francesco Rizzi, Ulrich Rude, Martin Schulz, Fred Fung, Robert Speck, Linda Stals, Keita Teranishi, Samuel Thibault, Dominik Thonnes, Andreas Wagner and Barbara Wohlmuth"Postprint (author's final draft

    A survey of high level frameworks in block-structured adaptive mesh refinement packages

    Get PDF
    pre-printOver the last decade block-structured adaptive mesh refinement (SAMR) has found increasing use in large, publicly available codes and frameworks. SAMR frameworks have evolved along different paths. Some have stayed focused on specific domain areas, others have pursued a more general functionality, providing the building blocks for a larger variety of applications. In this survey paper we examine a representative set of SAMR packages and SAMR-based codes that have been in existence for half a decade or more, have a reasonably sized and active user base outside of their home institutions, and are publicly available. The set consists of a mix of SAMR packages and application codes that cover a broad range of scientific domains. We look at their high-level frameworks, their design trade-offs and their approach to dealing with the advent of radical changes in hardware architecture. The codes included in this survey are BoxLib, Cactus, Chombo, Enzo, FLASH, and Uintah

    From Piz Daint to the Stars: Simulation of Stellar Mergers using High-Level Abstractions

    Get PDF
    We study the simulation of stellar mergers, which requires complex simulations with high computational demands. We have developed Octo-Tiger, a finite volume grid-based hydrodynamics simulation code with Adaptive Mesh Refinement which is unique in conserving both linear and angular momentum to machine precision. To face the challenge of increasingly complex, diverse, and heterogeneous HPC systems, Octo-Tiger relies on high-level programming abstractions. We use HPX with its futurization capabilities to ensure scalability both between nodes and within, and present first results replacing MPI with libfabric achieving up to a 2.8x speedup. We extend Octo-Tiger to heterogeneous GPU-accelerated supercomputers, demonstrating node-level performance and portability. We show scalability up to full system runs on Piz Daint. For the scenario's maximum resolution, the compute-critical parts (hydrodynamics and gravity) achieve 68.1% parallel efficiency at 2048 nodes.Comment: Accepted at SC1
    • 

    corecore