82 research outputs found

    Investigating applications portability with the Uintah DAG-based runtime system on PetaScale supercomputers

    Get PDF
    pre-printPresent trends in high performance computing present formidable challenges for applications code using multicore nodes possibly with accelerators and/or co-processors and reduced memory while still attaining scalability. Software frameworks that execute machine-independent applications code using a runtime system that shields users from architectural complexities offer a possible solution. The Uintah framework for example, solves a broad class of large-scale problems on structured adaptive grids using fluid-flow solvers coupled with particle-based solids methods. Uintah executes directed acyclic graphs of computational tasks with a scalable asynchronous and dynamic runtime system for CPU cores and/or accelerators/coprocessors on a node. Uintah's clear separation between application and runtime code has led to scalability increases of 1000x without significant changes to application code. This methodology is tested on three leading Top500 machines; OLCF Titan, TACC Stampede and ALCF Mira using three diverse and challenging applications problems. This investigation of scalability with regard to the different processors and communications performance leads to the overall conclusion that the adaptive DAG-based approach provides a very powerful abstraction for solving challenging multi-scale multi-physics engineering problems on some of the largest and most powerful computers available today

    A survey of high level frameworks in block-structured adaptive mesh refinement packages

    Get PDF
    pre-printOver the last decade block-structured adaptive mesh refinement (SAMR) has found increasing use in large, publicly available codes and frameworks. SAMR frameworks have evolved along different paths. Some have stayed focused on specific domain areas, others have pursued a more general functionality, providing the building blocks for a larger variety of applications. In this survey paper we examine a representative set of SAMR packages and SAMR-based codes that have been in existence for half a decade or more, have a reasonably sized and active user base outside of their home institutions, and are publicly available. The set consists of a mix of SAMR packages and application codes that cover a broad range of scientific domains. We look at their high-level frameworks, their design trade-offs and their approach to dealing with the advent of radical changes in hardware architecture. The codes included in this survey are BoxLib, Cactus, Chombo, Enzo, FLASH, and Uintah

    Doctor of Philosophy

    Get PDF
    dissertationRecent trends in high performance computing present larger and more diverse computers using multicore nodes possibly with accelerators and/or coprocessors and reduced memory. These changes pose formidable challenges for applications code to attain scalability. Software frameworks that execute machine-independent applications code using a runtime system that shields users from architectural complexities oer a portable solution for easy programming. The Uintah framework, for example, solves a broad class of large-scale problems on structured adaptive grids using fluid-flow solvers coupled with particle-based solids methods. However, the original Uintah code had limited scalability as tasks were run in a predefined order based solely on static analysis of the task graph and used only message passing interface (MPI) for parallelism. By using a new hybrid multithread and MPI runtime system, this research has made it possible for Uintah to scale to 700K central processing unit (CPU) cores when solving challenging fluid-structure interaction problems. Those problems often involve moving objects with adaptive mesh refinement and thus with highly variable and unpredictable work patterns. This research has also demonstrated an ability to run capability jobs on the heterogeneous systems with Nvidia graphics processing unit (GPU) accelerators or Intel Xeon Phi coprocessors. The new runtime system for Uintah executes directed acyclic graphs of computational tasks with a scalable asynchronous and dynamic runtime system for multicore CPUs and/or accelerators/coprocessors on a node. Uintah's clear separation between application and runtime code has led to scalability increases without significant changes to application code. This research concludes that the adaptive directed acyclic graph (DAG)-based approach provides a very powerful abstraction for solving challenging multiscale multiphysics engineering problems. Excellent scalability with regard to the different processors and communications performance are achieved on some of the largest and most powerful computers available today

    Techniques, Tricks and Algorithms for Efficient GPU-Based Processing of Higher Order Hyperbolic PDEs

    Full text link
    GPU computing is expected to play an integral part in all modern Exascale supercomputers. It is also expected that higher order Godunov schemes will make up about a significant fraction of the application mix on such supercomputers. It is, therefore, very important to prepare the community of users of higher order schemes for hyperbolic PDEs for this emerging opportunity. We focus on three broad and high-impact areas where higher order Godunov schemes are used. The first area is computational fluid dynamics (CFD). The second is computational magnetohydrodynamics (MHD) which has an involution constraint that has to be mimetically preserved. The third is computational electrodynamics (CED) which has involution constraints and also extremely stiff source terms. Together, these three diverse uses of higher order Godunov methodology, cover many of the most important applications areas. In all three cases, we show that the optimal use of algorithms, techniques and tricks, along with the use of OpenACC, yields superlative speedups on GPUs! As a bonus, we find a most remarkable and desirable result: some higher order schemes, with their larger operations count per zone, show better speedup than lower order schemes on GPUs. In other words, the GPU is an optimal stratagem for overcoming the higher computational complexities of higher order schemes! Several avenues for future improvement have also been identified. A scalability study is presented for a real-world application using GPUs and comparable numbers of high-end multicore CPUs. It is found that GPUs offer a substantial performance benefit over comparable number of CPUs, especially when all the methods designed in this paper are used.Comment: 73 pages, 17 figure

    From Piz Daint to the Stars: Simulation of Stellar Mergers using High-Level Abstractions

    Get PDF
    We study the simulation of stellar mergers, which requires complex simulations with high computational demands. We have developed Octo-Tiger, a finite volume grid-based hydrodynamics simulation code with Adaptive Mesh Refinement which is unique in conserving both linear and angular momentum to machine precision. To face the challenge of increasingly complex, diverse, and heterogeneous HPC systems, Octo-Tiger relies on high-level programming abstractions. We use HPX with its futurization capabilities to ensure scalability both between nodes and within, and present first results replacing MPI with libfabric achieving up to a 2.8x speedup. We extend Octo-Tiger to heterogeneous GPU-accelerated supercomputers, demonstrating node-level performance and portability. We show scalability up to full system runs on Piz Daint. For the scenario's maximum resolution, the compute-critical parts (hydrodynamics and gravity) achieve 68.1% parallel efficiency at 2048 nodes.Comment: Accepted at SC1

    OpenCL‐based implementation of an unstructured edge‐based finite element convection‐diffusion solver on graphics hardware

    Get PDF
    The solution of problems in computational fluid dynamics (CFD) represents a classical field for the application of advanced numerical methods. Many different approaches were developed over the years to address CFD applications. Good examples are finite volumes, finite differences (FD), and finite elements (FE) but also newer approaches such as the lattice‐Boltzmann (LB), smooth particle hydrodynamics or the particle finite element method. FD and LB methods on regular grids are known to be superior in terms of raw computing speed, but using such regular discretization represents an important limitation in dealing with complex geometries. Here, we concentrate on unstructured approaches which are less common in the GPU world. We employ a nonstandard FE approach which leverages an optimized edge‐based data structure allowing a highly parallel implementation. Such technique is applied to the ‘convection‐diffusion’ problem, which is often considered as a first step towards CFD because of similarities to the nonconservative form of the Navier–Stokes equations. In this regard, an existing highly optimized parallel OpenMP solver is ported to graphics hardware based on the OpenCL platform. The optimizations performed are discussed in detail. A number of benchmarks prove that the GPU‐accelerated OpenCL code consistently outperforms the OpenMP version

    Free-Surface Flow Simulations with Smoothed Particle Hydrodynamics Method using High-Performance Computing

    Get PDF
    Today, the use of modern high-performance computing (HPC) systems, such as clusters equipped with graphics processing units (GPUs), allows solving problems with resolutions unthinkable only a decade ago. The demand for high computational power is certainly an issue when simulating free-surface flows. However, taking the advantage of GPU’s parallel computing techniques, simulations involving up to 109 particles can be achieved. In this framework, this chapter shows some numerical results of typical coastal engineering problems obtained by means of the GPU-based computing servers maintained at the Environmental Physics Laboratory (EPhysLab) from Vigo University in Ourense (Spain) and the Tier-1 Galileo cluster of the Italian computing centre CINECA. The DualSPHysics free package based on smoothed particle hydrodynamics (SPH) technique was used for the purpose. SPH is a meshless particle method based on Lagrangian formulation by which the fluid domain is discretized as a collection of computing fluid particles. Speedup and efficiency of calculations are studied in terms of the initial interparticle distance and by coupling DualSPHysics with a NLSW wave propagation model. Water free-surface elevation, orbital velocities and wave forces are compared with results from experimental campaigns and theoretical solutions

    KINETICALLY CONSISTENT THERMAL LATTICE BOLTZMANN MODELS

    Get PDF
    The lattice Boltzmann (LB) method has developed into a numerically robust and efficient technique for simulating a wide variety of complex fluid flows. Unlike conventional CFD methods, the LB method is based on microscopic models and mesoscopic kinetic equations in which the collective long-term behavior of pseudo-particles is used to simulate the hydrodynamic limit of a system. Due to its kinetic basis, the LB method is particularly useful in applications involving interfacial dynamics and complex boundaries, such as multiphase or multicomponent flows. However, most of the LB models, both single and multiphase, do not satisfy the energy conservation principle, thus limiting their ability to provide quantitatively accurate predictions for cases with substantial heat transfer rates. To address this issue, this dissertation focuses on developing kinetically consistent and energy conserving LB models for single phase flows, in particular. Firstly, through a procedure similar to the Galerkin method, we present a mathematical formulation of the LB method based on the concept of projection of the distributions onto a Hermite-polynomial basis and their systematic truncation. This formulation is shown to be capable of approximating the near incompressible, weakly compressible, and fully compressible (thermal) limits of the continuous Boltzmann equation, thus obviating the previous low-Mach number assumption. Physically it means that this formulation allows a kinetically-accurate description of flows involving large heat transfer rates. The various higher-order discrete-velocity sets (lattices) that follow from this formulation are also compiled. The resulting higher-order thermal model is validated for benchmark thermal flows, such as Rayleigh-Benard convection and thermal Couette flow, in an off-lattice framework. Our tests indicate that the D2Q39-based thermal models are capable of modeling incompressible and weakly compressible thermal flows accurately. In the validation process, through a finite-difference-type boundary treatment, we also extend the applicability of higher-order la ttices to flow-domains with solid boundaries, which was previously restricted. Secondly, we present various off-lattice time-marching schemes for solving the discrete Boltzmann equation. Specifically, the various temporal schemes are analyzed with respect to their numerical stability as a function of the maximum allowable time-step . We show that the characteristics-based temporal schemes offer the best numerical stability among all other comparable schemes. Due to this enhanced numerical stability, we show that the usual restriction no longer applies, enabling larger time-steps, and thereby reducing the computational run-time. The off-lattice scheme were also successfully extended to higher-order LB models. Finally, we present the algorithm and single-core optimization techniques for a off-lattice, higher-order LB code. Using simple cache optimization techniques and a proper choice of the data-structure, we obtain a 5-7X improvement in performance compared to a naive, unoptimized code. Thereafter, the optimized code is parallelized using OpenMP. Scalability tests indicate a parallel efficiency of 80% on shared-memory systems with up to 50 cores (strong scaling). An analysis of the higher-order LB models also show that they are less memory-bound if the off-lattice temporal schemes are used, thus making them more scalable compared to the stream-collide type scheme
    • 

    corecore