297 research outputs found
Runge-Kutta-Gegenbauer explicit methods for advection-diffusion problems
In this paper, Runge-Kutta-Gegenbauer (RKG) stability polynomials of
arbitrarily high order of accuracy are introduced in closed form. The stability
domain of RKG polynomials extends in the the real direction with the square of
polynomial degree, and in the imaginary direction as an increasing function of
Gegenbauer parameter. Consequently, the polynomials are naturally suited to the
construction of high order stabilized Runge-Kutta (SRK) explicit methods for
systems of PDEs of mixed hyperbolic-parabolic type.
We present SRK methods composed of ordered forward Euler stages, with
complex-valued stepsizes derived from the roots of RKG stability polynomials of
degree . Internal stability is maintained at large stage number through an
ordering algorithm which limits internal amplification factors to .
Test results for mildly stiff nonlinear advection-diffusion-reaction problems
with moderate () mesh P\'eclet numbers are provided at second,
fourth, and sixth orders, with nonlinear reaction terms treated by complex
splitting techniques above second order.Comment: 20 pages, 7 figures, 3 table
Evaluating Performance of OpenMP Tasks in a Seismic Stencil Application
Simulations based on stencil computations (widely used in geosciences) have been dominated by the MPI+OpenMP programming model paradigm. Little effort has been devoted to experimenting with task-based parallelism in this context. We address this by introducing OpenMP task parallelism into the kernel of an industrial seismic modeling code, Minimod. We observe that even for these highly regular stencil computations, taskified kernels are competitive with traditional OpenMP-augmented loops, and in some experiments tasks even outperform loop parallelism.
This promising result sets the stage for more complex computational patterns. Simulations involve more than just the stencil calculation: a collection of kernels is often needed to accomplish the scientific objective (e.g., I/O, boundary conditions). These kernels can often be computed simultaneously; however, implementing this simultaneous computation with traditional programming models is not trivial. The presented approach will be extended to cover simultaneous execution of several kernels, where we expect to fully exploit the benefits of task-based programming
Towards Accelerating High-Order Stencils on Modern GPUs and Emerging Architectures with a Portable Framework
PDE discretization schemes yielding stencil-like computing patterns are
commonly used for seismic modeling, weather forecast, and other scientific
applications. Achieving HPC-level stencil computations on one architecture is
challenging, porting to other architectures without sacrificing performance
requires significant effort, especially in this golden age of many distinctive
architectures.
To help developers achieve performance, portability, and productivity with
stencil computations, we developed StencilPy. With StencilPy, developers write
stencil computations in a high-level domain-specific language, which promotes
productivity, while its backends generate efficient code for existing and
emerging architectures, including NVIDIA, AMD, and Intel GPUs, A64FX, and STX.
StencilPy demonstrates promising performance results on par with hand-written
code, maintains cross-architectural performance portability, and enhances
productivity. Its modular design enables easy configuration, customization, and
extension. A 25-point star-shaped stencil written in StencilPy is one-quarter
of the length of a hand-crafted CUDA code and achieves similar performance on
an NVIDIA H100 GPU
Stencil Computation with Vector Outer Product
Matrix computation units have been equipped in current architectures to
accelerate AI and high performance computing applications. The matrix
multiplication and vector outer product are two basic instruction types. The
latter one is lighter since the inputs are vectors. Thus it provides more
opportunities to develop flexible algorithms for problems other than dense
linear algebra computing and more possibilities to optimize the implementation.
Stencil computations represent a common class of nested loops in scientific and
engineering applications. This paper proposes a novel stencil algorithm using
vector outer products. Unlike previous work, the new algorithm arises from the
stencil definition in the scatter mode and is initially expressed with formulas
of vector outer products. The implementation incorporates a set of
optimizations to improve the memory reference pattern, execution pipeline and
data reuse by considering various algorithmic options and the data sharing
between input vectors. Evaluation on a simulator shows that our design achieves
a substantial speedup compared with vectorized stencil algorithm
Mesh adaptation on the sphere using optimal transport and the numerical solution of a Monge-Ampère type equation
An equation of Monge-Ampère type has, for the first time, been solved numerically on the surface of the sphere in order to generate optimally transported (OT) meshes, equidistributed with respect to a monitor function. Optimal transport generates meshes that keep the same connectivity as the original mesh, making them suitable for r-adaptive simulations, in which the equations of motion can be solved in a moving frame of reference in order to avoid mapping the solution between old and new meshes and to avoid load balancing problems on parallel computers. The semi-implicit solution of the Monge-Ampère type equation involves a new linearisation of the Hessian term, and exponential maps are used to map from old to new meshes on the sphere. The determinant of the Hessian is evaluated as the change in volume between old and new mesh cells, rather than using numerical approximations to the gradients. OT meshes are generated to compare with centroidal Voronoi tesselations on the sphere and are found to have advantages and disadvantages; OT equidistribution is more accurate, the number of iterations to convergence is independent of the mesh size, face skewness is reduced and the connectivity does not change. However anisotropy is higher and the OT meshes are non-orthogonal. It is shown that optimal transport on the sphere leads to meshes that do not tangle. However, tangling can be introduced by numerical errors in calculating the gradient of the mesh potential. Methods for alleviating this problem are explored. Finally, OT meshes are generated using observed precipitation as a monitor function, in order to demonstrate the potential power of the technique
- …