Search CORE

2,413 research outputs found

Recommended from our members

Preparing sparse solvers for exascale computing.

Author: Anzt Hartwig
Boman Erik
Curfman McInnes Lois
Falgout Rob
Ghysels Pieter
Heroux Michael
Li Xiaoye
Meier Yang Ulrike
Rajamanickam Sivasankaran
Rupp Karl
Smith Barry
Tran Mills Richard
Yamazaki Ichitaro
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

eScholarship - University of California

Multilayered abstractions for partial differential equations

Author: Markall Graham
Publication venue: Computing, Imperial College London
Publication date: 01/02/2014
Field of study

How do we build maintainable, robust, and performance-portable scientific applications? This thesis argues that the answer to this software engineering question in the context of the finite element method is through the use of layers of Domain-Specific Languages (DSLs) to separate the various concerns in the engineering of such codes. Performance-portable software achieves high performance on multiple diverse hardware platforms without source code changes. We demonstrate that finite element solvers written in a low-level language are not performance-portable, and therefore code must be specialised to the target architecture by a code generation framework. A prototype compiler for finite element variational forms that generates CUDA code is presented, and is used to explore how good performance on many-core platforms in automatically-generated finite element applications can be achieved. The differing code generation requirements for multi- and many-core platforms motivates the design of an additional abstraction, called PyOP2, that enables unstructured mesh applications to be performance-portable. We present a runtime code generation framework comprised of the Unified Form Language (UFL), the FEniCS Form Compiler, and PyOP2. This toolchain separates the succinct expression of a numerical method from the selection and generation of efficient code for local assembly. This is further decoupled from the selection of data formats and algorithms for efficient parallel implementation on a specific target architecture. We establish the successful separation of these concerns by demonstrating the performance-portability of code generated from a single high-level source code written in UFL across sequential C, CUDA, MPI and OpenMP targets. The performance of the generated code exceeds the performance of comparable alternative toolchains on multi-core architectures.Open Acces

Spiral - Imperial College Digital Repository

A scalable H-matrix approach for the solution of boundary integral equations on multi-GPU clusters

Author: Harbrecht Helmut
Zaspel Peter
Publication venue
Publication date: 01/06/2018
Field of study

In this work, we consider the solution of boundary integral equations by means of a scalable hierarchical matrix approach on clusters equipped with graphics hardware, i.e. graphics processing units (GPUs). To this end, we extend our existing single-GPU hierarchical matrix library hmglib such that it is able to scale on many GPUs and such that it can be coupled to arbitrary application codes. Using a model GPU implementation of a boundary element method (BEM) solver, we are able to achieve more than 67 percent relative parallel speed-up going from 128 to 1024 GPUs for a model geometry test case with 1.5 million unknowns and a real-world geometry test case with almost 1.2 million unknowns. On 1024 GPUs of the cluster Titan, it takes less than 6 minutes to solve the 1.5 million unknowns problem, with 5.7 minutes for the setup phase and 20 seconds for the iterative solver. To the best of the authors' knowledge, we here discuss the first fully GPU-based distributed-memory parallel hierarchical matrix Open Source library using the traditional H-matrix format and adaptive cross approximation with an application to BEM problems

arXiv.org e-Print Archive

edoc

Composable code generation for high order, compatible finite element methods

Author: Vorderwuelbecke Sophia
Publication venue: Department of Mathematics, Imperial College London
Publication date: 01/02/2023
Field of study

It has been widely recognised in the HPC communities across the world, that exploiting modern computer architectures, including exascale machines, to a full extent requires software commu- nities to adapt their algorithms. Computational methods with a high ratio of floating point op- erations to bandwidth are favorable. For solving partial differential equations, which can model many physical problems, high order finite element methods can calculate approximations with a high efficiency when a good solver is employed. Matrix-free algorithms solve the corresponding equations with a high arithmetic intensity. Vectorisation speeds up the operations by calculating one instruction on multiple data elements. Another recent development for solving partial differential are compatible (mimetic) finite ele- ment methods. In particular with application to geophysical flows, compatible discretisations ex- hibit desired numerical properties required for accurate approximations. Among others, this has been recognised by the UK Met office and their new dynamical core for weather and climate fore- casting is built on a compatible discretisation. Hybridisation has been proven to be an efficient solver for the corresponding equation systems, because it removes some inter-elemental coupling and localises expensive operations. This thesis combines the recent advances on vectorised, matrix-free, high order finite element methods in the HPC community on the one hand and hybridised, compatible discretisations in the geophysical community on the other. In previous work, a code generation framework has been developed to support the localised linear algebra required for hybridisation. First, the framework is adapted to support vectorisation and further, extended so that the equations can be solved fully matrix-free. Promising performance results are completing the thesis.Open Acces

Spiral - Imperial College Digital Repository

Automating embedded analysis capabilities and managing software complexity in multiphysics simulation part II: application to partial differential equations

Author: Owen Steven J.
Pawlowski Roger P.
Phipps Eric T.
Salinger Andrew G.
Siefert Christopher M.
Staten Matthew L.
Publication venue
Publication date: 01/01/2012
Field of study

A template-based generic programming approach was presented in a previous paper that separates the development effort of programming a physical model from that of computing additional quantities, such as derivatives, needed for embedded analysis algorithms. In this paper, we describe the implementation details for using the template-based generic programming approach for simulation and analysis of partial differential equations (PDEs). We detail several of the hurdles that we have encountered, and some of the software infrastructure developed to overcome them. We end with a demonstration where we present shape optimization and uncertainty quantification results for a 3D PDE application

arXiv.org e-Print Archive

Directory of Open Access Journals