2,413 research outputs found
Recommended from our members
Preparing sparse solvers for exascale computing.
Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
Multilayered abstractions for partial differential equations
How do we build maintainable, robust, and performance-portable scientific
applications? This thesis argues that the answer to this software engineering
question in the context of the finite element method is through the use of
layers of Domain-Specific Languages (DSLs) to separate the various concerns in
the engineering of such codes.
Performance-portable software achieves high performance on multiple diverse
hardware platforms without source code changes. We demonstrate that finite
element solvers written in a low-level language are not performance-portable,
and therefore code must be specialised to the target architecture by a code
generation framework. A prototype compiler for finite element variational forms
that generates CUDA code is presented, and is used to explore how good
performance on many-core platforms in automatically-generated finite element
applications can be achieved. The differing code generation requirements for
multi- and many-core platforms motivates the design of an additional
abstraction, called PyOP2, that enables unstructured mesh applications to be
performance-portable.
We present a runtime code generation framework comprised of the Unified Form
Language (UFL), the FEniCS Form Compiler, and PyOP2. This toolchain separates
the succinct expression of a numerical method from the selection and
generation of efficient code for local assembly. This is further decoupled from
the selection of data formats and algorithms for efficient parallel
implementation on a specific target architecture.
We establish the successful separation of these concerns by demonstrating the
performance-portability of code generated from a single high-level source code
written in UFL across sequential C, CUDA, MPI and OpenMP targets. The
performance of the generated code exceeds the performance of comparable
alternative toolchains on multi-core architectures.Open Acces
A scalable H-matrix approach for the solution of boundary integral equations on multi-GPU clusters
In this work, we consider the solution of boundary integral equations by
means of a scalable hierarchical matrix approach on clusters equipped with
graphics hardware, i.e. graphics processing units (GPUs). To this end, we
extend our existing single-GPU hierarchical matrix library hmglib such that it
is able to scale on many GPUs and such that it can be coupled to arbitrary
application codes. Using a model GPU implementation of a boundary element
method (BEM) solver, we are able to achieve more than 67 percent relative
parallel speed-up going from 128 to 1024 GPUs for a model geometry test case
with 1.5 million unknowns and a real-world geometry test case with almost 1.2
million unknowns. On 1024 GPUs of the cluster Titan, it takes less than 6
minutes to solve the 1.5 million unknowns problem, with 5.7 minutes for the
setup phase and 20 seconds for the iterative solver. To the best of the
authors' knowledge, we here discuss the first fully GPU-based
distributed-memory parallel hierarchical matrix Open Source library using the
traditional H-matrix format and adaptive cross approximation with an
application to BEM problems
Composable code generation for high order, compatible finite element methods
It has been widely recognised in the HPC communities across the world, that exploiting modern
computer architectures, including exascale machines, to a full extent requires software commu-
nities to adapt their algorithms. Computational methods with a high ratio of floating point op-
erations to bandwidth are favorable. For solving partial differential equations, which can model
many physical problems, high order finite element methods can calculate approximations with a
high efficiency when a good solver is employed. Matrix-free algorithms solve the corresponding
equations with a high arithmetic intensity. Vectorisation speeds up the operations by calculating
one instruction on multiple data elements.
Another recent development for solving partial differential are compatible (mimetic) finite ele-
ment methods. In particular with application to geophysical flows, compatible discretisations ex-
hibit desired numerical properties required for accurate approximations. Among others, this has
been recognised by the UK Met office and their new dynamical core for weather and climate fore-
casting is built on a compatible discretisation. Hybridisation has been proven to be an efficient
solver for the corresponding equation systems, because it removes some inter-elemental coupling
and localises expensive operations.
This thesis combines the recent advances on vectorised, matrix-free, high order finite element
methods in the HPC community on the one hand and hybridised, compatible discretisations in
the geophysical community on the other. In previous work, a code generation framework has been
developed to support the localised linear algebra required for hybridisation. First, the framework
is adapted to support vectorisation and further, extended so that the equations can be solved fully
matrix-free. Promising performance results are completing the thesis.Open Acces
Automating embedded analysis capabilities and managing software complexity in multiphysics simulation part II: application to partial differential equations
A template-based generic programming approach was presented in a previous
paper that separates the development effort of programming a physical model
from that of computing additional quantities, such as derivatives, needed for
embedded analysis algorithms. In this paper, we describe the implementation
details for using the template-based generic programming approach for
simulation and analysis of partial differential equations (PDEs). We detail
several of the hurdles that we have encountered, and some of the software
infrastructure developed to overcome them. We end with a demonstration where we
present shape optimization and uncertainty quantification results for a 3D PDE
application
- …