504 research outputs found

    Batch solution of small PDEs with the OPS DSL

    Get PDF
    In this paper we discuss the challenges and optimisations opportunities when solving a large number of small, equally sized discretised PDEs on regular grids. We present an extension of the OPS (Oxford Parallel library for Structured meshes) embedded Domain Specific Language, and show how support can be added for solving multiple systems, and how OPS makes it easy to deploy a variety of transformations and optimisations. The new capabilities in OPS allow to automatically apply data structure transformations, as well as execution schedule transformations to deliver high performance on a variety of hardware platforms. We evaluate our work on an industrially representative finance simulation on Intel CPUs, as well as NVIDIA GPUs

    Communication-Avoiding Algorithms for a High-Performance Hyperbolic PDE Engine

    Get PDF
    The study of waves has always been an important subject of research. Earthquakes, for example, have a direct impact on the daily lives of millions of people while gravitational waves reveal insight into the composition and history of the Universe. These physical phenomena, despite being tackled traditionally by different fields of physics, have in common that they are modelled the same way mathematically: as a system of hyperbolic partial differential equations (PDEs). The ExaHyPE project (“An Exascale Hyperbolic PDE Engine") translates this similarity into a software engine that can be quickly adapted to simulate a wide range of hyperbolic partial differential equations. ExaHyPE’s key idea is that the user only specifies the physics while the engine takes care of the parallelisation and the interplay of the underlying numerical methods. Consequently, a first simulation code for a new hyperbolic PDE can often be realised within a few hours. This is a task that traditionally can take weeks, months, even years for researchers starting from scratch. My main contribution to ExaHyPE is the development of the core infrastructure. This comprises the development and implementation of ExaHyPE’s solvers and adaptive mesh refinement procedures, it’s MPI+X parallelisation as well as high-level aspects of ExaHyPE’s application-tailored code generation, which allows to adapt ExaHyPE to model many different hyperbolic PDE systems. Like any high-performance computing code, ExaHyPE has to tackle the challenges of the coming exascale computing era, notably network communication latencies and the growing memory wall. In this thesis, I propose memory-efficient realisations of ExaHyPE’s solvers that avoid data movement together with a novel task-based MPI+X parallelisation concept that allows to hide network communication behind computation in dynamically adaptive simulations

    Computational Aerodynamics on unstructed meshes

    Get PDF
    New 2D and 3D unstructured-grid based flow solvers have been developed for simulating steady compressible flows for aerodynamic applications. The codes employ the full compressible Euler/Navier-Stokes equations. The Spalart-Al Imaras one equation turbulence model is used to model turbulence effects of flows. The spatial discretisation has been obtained using a cell-centred finite volume scheme on unstructured-grids, consisting of triangles in 2D and of tetrahedral and prismatic elements in 3D. The temporal discretisation has been obtained with an explicit multistage Runge-Kutta scheme. An "inflation" mesh generation technique is introduced to effectively reduce the difficulty in generating highly stretched 2D/3D viscous grids in regions near solid surfaces. The explicit flow method is accelerated by the use of a multigrid method with consideration of the high grid aspect ratio in viscous flow simulations. A solution mesh adaptation technique is incorporated to improve the overall accuracy of the 2D inviscid and viscous flow solutions. The 3D flow solvers are parallelised in a MIMD fashion aimed at a PC cluster system to reduce the computing time for aerodynamic applications. The numerical methods are first applied to several 2D inviscid flow cases, including subsonic flow in a bump channel, transonic flow around a NACA0012 airfoil and transonic flow around the RAE 2822 airfoil to validate the numerical algorithms. The rest of the 2D case studies concentrate on viscous flow simulations including laminar/turbulent flow over a flat plate, transonic turbulent flow over the RAE 2822 airfoil, and low speed turbulent flows in a turbine cascade with massive separations. The results are compared to experimental data to assess the accuracy of the method. The over resolved problem with mesh adaptation on viscous flow simulations is addressed with a two phase mesh reconstruction procedure. The solution convergence rate with the aspect ratio adaptive multigrid method and the direct connectivity based multigrid is assessed in several viscous turbulent flow simulations. Several 3D test cases are presented to validate the numerical algorithms for solving Euler/Navier-Stokes equations. Inviscid flow around the M6 wing airfoil is simulated on the tetrahedron based 3D flow solver with an upwind scheme and spatial second order finite volume method. The efficiency of the multigrid for inviscid flow simulations is examined. The efficiency of the parallelised 3D flow solver and the PC cluster system is assessed with simulations of the same case with different partitioning schemes. The present parallelised 3D flow solvers on the PC cluster system show satisfactory parallel computing performance. Turbulent flows over a flat plate are simulated with the tetrahedron based and prismatic based flow solver to validate the viscous term treatment. Next, simulation of turbulent flow over the M6 wing is carried out with the parallelised 3D flow solvers to demonstrate the overall accuracy of the algorithms and the efficiency of the multigrid method. The results show very good agreement with experimental data. A highly stretched and well-formed computational grid near the solid wall and wake regions is generated with the "inflation" method. The aspect ratio adaptive multigrid displayed a good acceleration rate. Finally, low speed flow around the NREL Phase 11 Wind turbine is simulated and the results are compared to the experimental data

    Static dependency analysis of recursive structures for parallelisation

    Get PDF

    Discrete adjoints on many cores Algorithmic differentiation of accelerated fluid simulations

    Get PDF
    PhDSimulations are used in science and industry to predict the performance of technical systems. Adjoint derivatives of these simulations can reveal the sensitivity of the system performance to changes in design or operating conditions, and are increasingly used in shape optimisation and uncertainty quantification. Algorithmic differentiation (AD) by source-transformation is an efficient method to compute such derivatives. AD requires an analysis of the computation and its data flow to produce efficient adjoint code. One important step is the activity analysis that detects operations that need to be differentiated. An improved activity analysis is investigated in this thesis that simplifies build procedures for certain adjoint programs, and is demonstrated to improve the speed of an adjoint fluid dynamics solver. The method works by allowing a context-dependent analysis of routines. The ongoing trend towards multi- and many-core architectures such as the Intel XeonPhi is creating challenges for AD. Two novel approaches are presented that replicate the parallelisation of a program in its corresponding adjoint program. The first approach detects loops that naturally result in a parallelisable adjoint loop, while the second approach uses loop transformation and the aforementioned context-dependent analysis to enforce parallelisable data access in the adjoint loop. A case study shows that both approaches yield adjoints that are as scalable as their underlying primal programs. Adjoint computations are limited by their memory footprint, particularly in unsteady simulations, for which this work presents incomplete checkpointing as a method to reduce memory usage at the cost of a slight reduction in accuracy. Finally, convergence of iterative linear solvers is discussed, which is especially relevant on accelerator cards, where single precision floating point numbers are frequently used and the choice of solvers is limited by the small memory size. Some problems that are particular to adjoint computations are discussed.European Union

    Numerical solution of 3-D electromagnetic problems in exploration geophysics and its implementation on massively parallel computers

    Get PDF
    The growing significance, technical development and employment of electromagnetic (EM) methods in exploration geophysics have led to the increasing need for reliable and fast techniques of interpretation of 3-D EM data sets acquired in complex geological environments. The first and most important step to creating an inversion method is the development of a solver for the forward problem. In order to create an efficient, reliable and practical 3-D EM inversion, it is necessary to have a 3-D EM modelling code that is highly accurate, robust and very fast. This thesis focuses precisely on this crucial and very demanding step to building a 3-D EM interpretation method. The thesis presents as its main contribution a highly accurate, robust, very fast and extremely scalable numerical method for 3-D EM modelling in geophysics that is based on finite elements (FE) and designed to run on massively parallel computing platforms. Thanks to the fact that the FE approach supports completely unstructured tetrahedral meshes as well as local mesh refinements, the presented solver is able to represent complex geometries of subsurface structures very precisely and thus improve the solution accuracy and avoid misleading artefacts in images. Consequently, it can be successfully used in geological environments of arbitrary geometrical complexities. The parallel implementation of the method, which is based on the domain decomposition and a hybrid MPI-OpenMP scheme, has proved to be highly scalable - the achieved speed-up is close to the linear for more than a thousand processors. Thanks to this, the code is able to deal with extremely large problems, which may have hundreds of millions of degrees of freedom, in a very efficient way. The importance of having this forward-problem solver lies in the fact that it is now possible to create a 3-D EM inversion that can deal with data obtained in extremely complex geological environments in a way that is realistic for practical use in industry. So far, such imaging tool has not been proposed due to a lack of efficient, parallel FE solutions as well as the limitations of efficient solvers based on finite differences. In addition, the thesis discusses physical, mathematical and numerical aspects and challenges of 3-D EM modelling, which have been studied during my research in order to properly design the presented software for EM field simulations on 3-D areas of the Earth. Through this work, a physical problem formulation based on the secondary Coulomb-gauged EM potentials has been validated, proving that it can be successfully used with the standard nodal FE method to give highly accurate numerical solutions. Also, this work has shown that Krylov subspace iterative methods are the best solution for solving linear systems that arise after FE discretisation of the problem under consideration. More precisely, it has been discovered empirically that the best iterative method for this kind of problems is biconjugate gradient stabilised with an elaborate preconditioner. Since most commonly used preconditioners proved to be either unable to improve the convergence of the implemented solvers to the desired extent, or impractical in the parallel context, I have proposed a preconditioning technique for Krylov methods that is based on algebraic multigrid. Tests for various problems with different conductivity structures and characteristics have shown that the new preconditioner greatly improves the convergence of different Krylov subspace methods, which significantly reduces the total execution time of the program and improves the solution quality. Furthermore, the preconditioner is very practical for parallel implementation. Finally, it has been concluded that there are not any restrictions in employing classical parallel programming models, MPI and OpenMP, for parallelisation of the presented FE solver. Moreover, they have proved to be enough to provide an excellent scalability for it
    • …
    corecore