424 research outputs found
An experience report on (auto-)tuning of mesh-based PDE solvers on shared memory systems.
With the advent of manycore systems, shared memory parallelisation has gained importance in high performance computing. Once a code is decomposed into tasks or parallel regions, it becomes crucial to identify reasonable grain sizes, i.e. minimum problem sizes per task that make the algorithm expose a high concurrency at low overhead. Many papers do not detail what reasonable task sizes are, and consider their findings craftsmanship not worth discussion. We have implemented an autotuning algorithm, a machine learning approach, for a project developing a hyperbolic equation system solver. Autotuning here is important as the grid and task workload are multifaceted and change frequently during runtime. In this paper, we summarise our lessons learned. We infer tweaks and idioms for general autotuning algorithms and we clarify that such a approach does not free users completely from grain size awareness
Communication-Avoiding Algorithms for a High-Performance Hyperbolic PDE Engine
The study of waves has always been an important subject of research. Earthquakes, for example,
have a direct impact on the daily lives of millions of people while gravitational waves reveal
insight into the composition and history of the Universe. These physical phenomena, despite
being tackled traditionally by different fields of physics, have in common that they are modelled
the same way mathematically: as a system of hyperbolic partial differential equations (PDEs).
The ExaHyPE project (“An Exascale Hyperbolic PDE Engine") translates this similarity into
a software engine that can be quickly adapted to simulate a wide range of hyperbolic partial
differential equations. ExaHyPE’s key idea is that the user only specifies the physics while the
engine takes care of the parallelisation and the interplay of the underlying numerical methods.
Consequently, a first simulation code for a new hyperbolic PDE can often be realised within a
few hours. This is a task that traditionally can take weeks, months, even years for researchers
starting from scratch.
My main contribution to ExaHyPE is the development of the core infrastructure. This
comprises the development and implementation of ExaHyPE’s solvers and adaptive mesh
refinement procedures, it’s MPI+X parallelisation as well as high-level aspects of ExaHyPE’s
application-tailored code generation, which allows to adapt ExaHyPE to model many different
hyperbolic PDE systems. Like any high-performance computing code, ExaHyPE has to tackle the
challenges of the coming exascale computing era, notably network communication latencies and
the growing memory wall. In this thesis, I propose memory-efficient realisations of ExaHyPE’s
solvers that avoid data movement together with a novel task-based MPI+X parallelisation
concept that allows to hide network communication behind computation in dynamically adaptive
simulations
State-of-the-art in aerodynamic shape optimisation methods
Aerodynamic optimisation has become an indispensable component for any aerodynamic design over the past 60 years, with applications to aircraft, cars, trains, bridges, wind turbines, internal pipe flows, and cavities, among others, and is thus relevant in many facets of technology. With advancements in computational power, automated design optimisation procedures have become more competent, however, there is an ambiguity and bias throughout the literature with regards to relative performance of optimisation architectures and employed algorithms. This paper provides a well-balanced critical review of the dominant optimisation approaches that have been integrated with aerodynamic theory for the purpose of shape optimisation. A total of 229 papers, published in more than 120 journals and conference proceedings, have been classified into 6 different optimisation algorithm approaches. The material cited includes some of the most well-established authors and publications in the field of aerodynamic optimisation. This paper aims to eliminate bias toward certain algorithms by analysing the limitations, drawbacks, and the benefits of the most utilised optimisation approaches. This review provides comprehensive but straightforward insight for non-specialists and reference detailing the current state for specialist practitioners
A high-performance open-source framework for multiphysics simulation and adjoint-based shape and topology optimization
The first part of this thesis presents the advances made in the Open-Source software SU2,
towards transforming it into a high-performance framework for design and optimization of
multiphysics problems. Through this work, and in collaboration with other authors, a tenfold
performance improvement was achieved for some problems. More importantly, problems that
had previously been impossible to solve in SU2, can now be used in numerical optimization
with shape or topology variables. Furthermore, it is now exponentially simpler to study new
multiphysics applications, and to develop new numerical schemes taking advantage of modern
high-performance-computing systems.
In the second part of this thesis, these capabilities allowed the application of topology optimiza-
tion to medium scale fluid-structure interaction problems, using high-fidelity models (nonlinear
elasticity and Reynolds-averaged Navier-Stokes equations), which had not been done before
in the literature. This showed that topology optimization can be used to target aerodynamic
objectives, by tailoring the interaction between fluid and structure. However, it also made ev-
ident the limitations of density-based methods for this type of problem, in particular, reliably
converging to discrete solutions. This was overcome with new strategies to both guarantee and
accelerate (i.e. reduce the overall computational cost) the convergence to discrete solutions in
fluid-structure interaction problems.Open Acces
The LifeV library: engineering mathematics beyond the proof of concept
LifeV is a library for the finite element (FE) solution of partial
differential equations in one, two, and three dimensions. It is written in C++
and designed to run on diverse parallel architectures, including cloud and high
performance computing facilities. In spite of its academic research nature,
meaning a library for the development and testing of new methods, one
distinguishing feature of LifeV is its use on real world problems and it is
intended to provide a tool for many engineering applications. It has been
actually used in computational hemodynamics, including cardiac mechanics and
fluid-structure interaction problems, in porous media, ice sheets dynamics for
both forward and inverse problems. In this paper we give a short overview of
the features of LifeV and its coding paradigms on simple problems. The main
focus is on the parallel environment which is mainly driven by domain
decomposition methods and based on external libraries such as MPI, the Trilinos
project, HDF5 and ParMetis.
Dedicated to the memory of Fausto Saleri.Comment: Review of the LifeV Finite Element librar
Multilayered abstractions for partial differential equations
How do we build maintainable, robust, and performance-portable scientific
applications? This thesis argues that the answer to this software engineering
question in the context of the finite element method is through the use of
layers of Domain-Specific Languages (DSLs) to separate the various concerns in
the engineering of such codes.
Performance-portable software achieves high performance on multiple diverse
hardware platforms without source code changes. We demonstrate that finite
element solvers written in a low-level language are not performance-portable,
and therefore code must be specialised to the target architecture by a code
generation framework. A prototype compiler for finite element variational forms
that generates CUDA code is presented, and is used to explore how good
performance on many-core platforms in automatically-generated finite element
applications can be achieved. The differing code generation requirements for
multi- and many-core platforms motivates the design of an additional
abstraction, called PyOP2, that enables unstructured mesh applications to be
performance-portable.
We present a runtime code generation framework comprised of the Unified Form
Language (UFL), the FEniCS Form Compiler, and PyOP2. This toolchain separates
the succinct expression of a numerical method from the selection and
generation of efficient code for local assembly. This is further decoupled from
the selection of data formats and algorithms for efficient parallel
implementation on a specific target architecture.
We establish the successful separation of these concerns by demonstrating the
performance-portability of code generated from a single high-level source code
written in UFL across sequential C, CUDA, MPI and OpenMP targets. The
performance of the generated code exceeds the performance of comparable
alternative toolchains on multi-core architectures.Open Acces
Software for Exascale Computing - SPPEXA 2016-2019
This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest
- …