3,383 research outputs found

    Reproducibility, accuracy and performance of the Feltor code and library on parallel computer architectures

    Get PDF
    Feltor is a modular and free scientific software package. It allows developing platform independent code that runs on a variety of parallel computer architectures ranging from laptop CPUs to multi-GPU distributed memory systems. Feltor consists of both a numerical library and a collection of application codes built on top of the library. Its main target are two- and three-dimensional drift- and gyro-fluid simulations with discontinuous Galerkin methods as the main numerical discretization technique. We observe that numerical simulations of a recently developed gyro-fluid model produce non-deterministic results in parallel computations. First, we show how we restore accuracy and bitwise reproducibility algorithmically and programmatically. In particular, we adopt an implementation of the exactly rounded dot product based on long accumulators, which avoids accuracy losses especially in parallel applications. However, reproducibility and accuracy alone fail to indicate correct simulation behaviour. In fact, in the physical model slightly different initial conditions lead to vastly different end states. This behaviour translates to its numerical representation. Pointwise convergence, even in principle, becomes impossible for long simulation times. In a second part, we explore important performance tuning considerations. We identify latency and memory bandwidth as the main performance indicators of our routines. Based on these, we propose a parallel performance model that predicts the execution time of algorithms implemented in Feltor and test our model on a selection of parallel hardware architectures. We are able to predict the execution time with a relative error of less than 25% for problem sizes between 0.1 and 1000 MB. Finally, we find that the product of latency and bandwidth gives a minimum array size per compute node to achieve a scaling efficiency above 50% (both strong and weak)

    Performance of a parallel code for the Euler equations on hypercube computers

    Get PDF
    The performance of hypercubes were evaluated on a computational fluid dynamics problem and the parallel environment issues were considered that must be addressed, such as algorithm changes, implementation choices, programming effort, and programming environment. The evaluation focuses on a widely used fluid dynamics code, FLO52, which solves the two dimensional steady Euler equations describing flow around the airfoil. The code development experience is described, including interacting with the operating system, utilizing the message-passing communication system, and code modifications necessary to increase parallel efficiency. Results from two hypercube parallel computers (a 16-node iPSC/2, and a 512-node NCUBE/ten) are discussed and compared. In addition, a mathematical model of the execution time was developed as a function of several machine and algorithm parameters. This model accurately predicts the actual run times obtained and is used to explore the performance of the code in interesting but yet physically realizable regions of the parameter space. Based on this model, predictions about future hypercubes are made

    Towards a Mini-App for Smoothed Particle Hydrodynamics at Exascale

    Full text link
    The smoothed particle hydrodynamics (SPH) technique is a purely Lagrangian method, used in numerical simulations of fluids in astrophysics and computational fluid dynamics, among many other fields. SPH simulations with detailed physics represent computationally-demanding calculations. The parallelization of SPH codes is not trivial due to the absence of a structured grid. Additionally, the performance of the SPH codes can be, in general, adversely impacted by several factors, such as multiple time-stepping, long-range interactions, and/or boundary conditions. This work presents insights into the current performance and functionalities of three SPH codes: SPHYNX, ChaNGa, and SPH-flow. These codes are the starting point of an interdisciplinary co-design project, SPH-EXA, for the development of an Exascale-ready SPH mini-app. To gain such insights, a rotating square patch test was implemented as a common test simulation for the three SPH codes and analyzed on two modern HPC systems. Furthermore, to stress the differences with the codes stemming from the astrophysics community (SPHYNX and ChaNGa), an additional test case, the Evrard collapse, has also been carried out. This work extrapolates the common basic SPH features in the three codes for the purpose of consolidating them into a pure-SPH, Exascale-ready, optimized, mini-app. Moreover, the outcome of this serves as direct feedback to the parent codes, to improve their performance and overall scalability.Comment: 18 pages, 4 figures, 5 tables, 2018 IEEE International Conference on Cluster Computing proceedings for WRAp1

    Evolution of a double-front Rayleigh-Taylor system using a GPU-based high resolution thermal Lattice-Boltzmann model

    Full text link
    We study the turbulent evolution originated from a system subjected to a Rayleigh-Taylor instability with a double density at high resolution in a 2 dimensional geometry using a highly optimized thermal Lattice Boltzmann code for GPUs. The novelty of our investigation stems from the initial condition, given by the superposition of three layers with three different densities, leading to the development of two Rayleigh-Taylor fronts that expand upward and downward and collide in the middle of the cell. By using high resolution numerical data we highlight the effects induced by the collision of the two turbulent fronts in the long time asymptotic regime. We also provide details on the optimized Lattice-Boltzmann code that we have run on a cluster of GPU

    A 2D Electromagnetic PIC Code for Distributed Memory Parallel Computers

    Get PDF
    The two dimensional electrostatic plasma particle in cell (PIC) code described an [1] has been upgraded to a 2D electromagnetic PIC code running on the Caltech/JPL Mark IIIfp and the Intel iPSC/860 parallel MIMD computers. The code solves the complete time dependent Maxwell’s equations where the plasma responses, i.e., the charge and current density in the plasma, are evaluated by advancing in time the trajectories of ~ 10^6 particles in their self-consistent electromagnetic field. The field equations are solved in Fourier space. Parallelisation is achieved through domain decomposition in real and Fourier space. Results from a simulation showing a two-dimensional Alfèn wave filamentation instability are shown; these are the first simulations of this 2D Alfèn wave decay process

    Recent EUROfusion Achievements in Support of Computationally Demanding Multiscale Fusion Physics Simulations and Integrated Modeling

    Get PDF
    Integrated modeling (IM) of present experiments and future tokamak reactors requires the provision of computational resources and numerical tools capable of simulating multiscale spatial phenomena as well as fast transient events and relatively slow plasma evolution within a reasonably short computational time. Recent progress in the implementation of the new computational resources for fusion applications in Europe based on modern supercomputer technologies (supercomputer MARCONI-FUSION), in the optimization and speedup of the EU fusion-related first-principle codes, and in the development of a basis for physics codes/modules integration into a centrally maintained suite of IM tools achieved within the EUROfusion Consortium is presented. Physics phenomena that can now be reasonably modelled in various areas (core turbulence and magnetic reconnection, edge and scrape-off layer physics, radio-frequency heating and current drive, magnetohydrodynamic model, reflectometry simulations) following successful code optimizations and parallelization are briefly described. Development activities in support to IM are summarized. They include support to (1) the local deployment of the IM infrastructure and access to experimental data at various host sites, (2) the management of releases for sophisticated IM workflows involving a large number of components, and (3) the performance optimization of complex IM workflows.This work has been carried out within the framework of the EUROfusion Consortium and has received funding from the Euratom research and training programme 2014 to 2018 under grant agreement 633053. The views and opinions expressed herein do not necessarily reflect those of the European Commission or ITER.Peer ReviewedPostprint (published version
    • …
    corecore