Search CORE

297 research outputs found

Runge-Kutta-Gegenbauer explicit methods for advection-diffusion problems

Author: O'Sullivan Stephen
Publication venue: 'Elsevier BV'
Publication date: 18/04/2019
Field of study

In this paper, Runge-Kutta-Gegenbauer (RKG) stability polynomials of arbitrarily high order of accuracy are introduced in closed form. The stability domain of RKG polynomials extends in the the real direction with the square of polynomial degree, and in the imaginary direction as an increasing function of Gegenbauer parameter. Consequently, the polynomials are naturally suited to the construction of high order stabilized Runge-Kutta (SRK) explicit methods for systems of PDEs of mixed hyperbolic-parabolic type. We present SRK methods composed of

L

ordered forward Euler stages, with complex-valued stepsizes derived from the roots of RKG stability polynomials of degree

L

. Internal stability is maintained at large stage number through an ordering algorithm which limits internal amplification factors to

10 L^2

. Test results for mildly stiff nonlinear advection-diffusion-reaction problems with moderate (

\lesssim 1

) mesh P\'eclet numbers are provided at second, fourth, and sixth orders, with nonlinear reaction terms treated by complex splitting techniques above second order.Comment: 20 pages, 7 figures, 3 table

arXiv.org e-Print Archive

Arrow@TUDublin

Evaluating Performance of OpenMP Tasks in a Seismic Stencil Application

Author: Araya-Polo Mauricio
Chapman Barbara
Meng Jie
Raut Eric
Publication venue: Academic Commons
Publication date: 01/09/2020
Field of study

Simulations based on stencil computations (widely used in geosciences) have been dominated by the MPI+OpenMP programming model paradigm. Little effort has been devoted to experimenting with task-based parallelism in this context. We address this by introducing OpenMP task parallelism into the kernel of an industrial seismic modeling code, Minimod. We observe that even for these highly regular stencil computations, taskified kernels are competitive with traditional OpenMP-augmented loops, and in some experiments tasks even outperform loop parallelism. This promising result sets the stage for more complex computational patterns. Simulations involve more than just the stencil calculation: a collection of kernels is often needed to accomplish the scientific objective (e.g., I/O, boundary conditions). These kernels can often be computed simultaneously; however, implementing this simultaneous computation with traditional programming models is not trivial. The presented approach will be extended to cover simultaneous execution of several kernels, where we expect to fully exploit the benefits of task-based programming

Stony Brook University - SUNY

Towards Accelerating High-Order Stencils on Modern GPUs and Emerging Architectures with a Portable Framework

Author: Araya-Polo Mauricio
Mellor-Crummey John
Sai Ryuichi
Xu Jinfan
Publication venue
Publication date: 08/09/2023
Field of study

PDE discretization schemes yielding stencil-like computing patterns are commonly used for seismic modeling, weather forecast, and other scientific applications. Achieving HPC-level stencil computations on one architecture is challenging, porting to other architectures without sacrificing performance requires significant effort, especially in this golden age of many distinctive architectures. To help developers achieve performance, portability, and productivity with stencil computations, we developed StencilPy. With StencilPy, developers write stencil computations in a high-level domain-specific language, which promotes productivity, while its backends generate efficient code for existing and emerging architectures, including NVIDIA, AMD, and Intel GPUs, A64FX, and STX. StencilPy demonstrates promising performance results on par with hand-written code, maintains cross-architectural performance portability, and enhances productivity. Its modular design enables easy configuration, customization, and extension. A 25-point star-shaped stencil written in StencilPy is one-quarter of the length of a hand-crafted CUDA code and achieves similar performance on an NVIDIA H100 GPU

arXiv.org e-Print Archive

Stencil Computation with Vector Outer Product

Author: Ma Penghao
Wang Long
Wang Zhe
Yan Baicheng
Yuan Liang
Zhang Yunquan
Zhao Wenxuan
Publication venue
Publication date: 24/10/2023
Field of study

Matrix computation units have been equipped in current architectures to accelerate AI and high performance computing applications. The matrix multiplication and vector outer product are two basic instruction types. The latter one is lighter since the inputs are vectors. Thus it provides more opportunities to develop flexible algorithms for problems other than dense linear algebra computing and more possibilities to optimize the implementation. Stencil computations represent a common class of nested loops in scientific and engineering applications. This paper proposes a novel stencil algorithm using vector outer products. Unlike previous work, the new algorithm arises from the stencil definition in the scatter mode and is initially expressed with formulas of vector outer products. The implementation incorporates a set of optimizations to improve the memory reference pattern, execution pipeline and data reuse by considering various algorithmic options and the data sharing between input vectors. Evaluation on a simulator shows that our design achieves a substantial speedup compared with vectorized stencil algorithm

arXiv.org e-Print Archive

Mesh adaptation on the sphere using optimal transport and the numerical solution of a Monge-Ampère type equation

Author: Ahmad
Benamou
Berger
Brenier
Browne
Budd
Budd
Budd
Chris Budd
Compo
Cossette
Cotter
Dean
Dean
Dietachmayer
Du
Feng
Fiedler
Froese
Froese
Froese
Giraldo
Heikes
Hilary Weller
Huang
Jacobsen
Kühnlein
Li
McCann
Melvin
Mike Cullen
Mishra
Oberman
OpenFOAM
Philip Browne
Ringler
Skamarock
Skamarock
Thuburn
Villani
Wang
Weller
Weller
Wessel
Publication venue: 'Elsevier BV'
Publication date: 09/12/2015
Field of study

An equation of Monge-Ampère type has, for the first time, been solved numerically on the surface of the sphere in order to generate optimally transported (OT) meshes, equidistributed with respect to a monitor function. Optimal transport generates meshes that keep the same connectivity as the original mesh, making them suitable for r-adaptive simulations, in which the equations of motion can be solved in a moving frame of reference in order to avoid mapping the solution between old and new meshes and to avoid load balancing problems on parallel computers. The semi-implicit solution of the Monge-Ampère type equation involves a new linearisation of the Hessian term, and exponential maps are used to map from old to new meshes on the sphere. The determinant of the Hessian is evaluated as the change in volume between old and new mesh cells, rather than using numerical approximations to the gradients. OT meshes are generated to compare with centroidal Voronoi tesselations on the sphere and are found to have advantages and disadvantages; OT equidistribution is more accurate, the number of iterations to convergence is independent of the mesh size, face skewness is reduced and the connectivity does not change. However anisotropy is higher and the OT meshes are non-orthogonal. It is shown that optimal transport on the sphere leads to meshes that do not tangle. However, tangling can be introduced by numerical errors in calculating the gradient of the mesh potential. Methods for alleviating this problem are explored. Finally, OT meshes are generated using observed precipitation as a monitor function, in order to demonstrate the potential power of the technique

arXiv.org e-Print Archive

Central Archive at the University of Reading

Crossref

Elsevier - Publisher Connector