297 research outputs found

    Runge-Kutta-Gegenbauer explicit methods for advection-diffusion problems

    Get PDF
    In this paper, Runge-Kutta-Gegenbauer (RKG) stability polynomials of arbitrarily high order of accuracy are introduced in closed form. The stability domain of RKG polynomials extends in the the real direction with the square of polynomial degree, and in the imaginary direction as an increasing function of Gegenbauer parameter. Consequently, the polynomials are naturally suited to the construction of high order stabilized Runge-Kutta (SRK) explicit methods for systems of PDEs of mixed hyperbolic-parabolic type. We present SRK methods composed of LL ordered forward Euler stages, with complex-valued stepsizes derived from the roots of RKG stability polynomials of degree LL. Internal stability is maintained at large stage number through an ordering algorithm which limits internal amplification factors to 10L210 L^2. Test results for mildly stiff nonlinear advection-diffusion-reaction problems with moderate (1\lesssim 1) mesh P\'eclet numbers are provided at second, fourth, and sixth orders, with nonlinear reaction terms treated by complex splitting techniques above second order.Comment: 20 pages, 7 figures, 3 table

    Evaluating Performance of OpenMP Tasks in a Seismic Stencil Application

    Get PDF
    Simulations based on stencil computations (widely used in geosciences) have been dominated by the MPI+OpenMP programming model paradigm. Little effort has been devoted to experimenting with task-based parallelism in this context. We address this by introducing OpenMP task parallelism into the kernel of an industrial seismic modeling code, Minimod. We observe that even for these highly regular stencil computations, taskified kernels are competitive with traditional OpenMP-augmented loops, and in some experiments tasks even outperform loop parallelism. This promising result sets the stage for more complex computational patterns. Simulations involve more than just the stencil calculation: a collection of kernels is often needed to accomplish the scientific objective (e.g., I/O, boundary conditions). These kernels can often be computed simultaneously; however, implementing this simultaneous computation with traditional programming models is not trivial. The presented approach will be extended to cover simultaneous execution of several kernels, where we expect to fully exploit the benefits of task-based programming

    Towards Accelerating High-Order Stencils on Modern GPUs and Emerging Architectures with a Portable Framework

    Full text link
    PDE discretization schemes yielding stencil-like computing patterns are commonly used for seismic modeling, weather forecast, and other scientific applications. Achieving HPC-level stencil computations on one architecture is challenging, porting to other architectures without sacrificing performance requires significant effort, especially in this golden age of many distinctive architectures. To help developers achieve performance, portability, and productivity with stencil computations, we developed StencilPy. With StencilPy, developers write stencil computations in a high-level domain-specific language, which promotes productivity, while its backends generate efficient code for existing and emerging architectures, including NVIDIA, AMD, and Intel GPUs, A64FX, and STX. StencilPy demonstrates promising performance results on par with hand-written code, maintains cross-architectural performance portability, and enhances productivity. Its modular design enables easy configuration, customization, and extension. A 25-point star-shaped stencil written in StencilPy is one-quarter of the length of a hand-crafted CUDA code and achieves similar performance on an NVIDIA H100 GPU

    Stencil Computation with Vector Outer Product

    Full text link
    Matrix computation units have been equipped in current architectures to accelerate AI and high performance computing applications. The matrix multiplication and vector outer product are two basic instruction types. The latter one is lighter since the inputs are vectors. Thus it provides more opportunities to develop flexible algorithms for problems other than dense linear algebra computing and more possibilities to optimize the implementation. Stencil computations represent a common class of nested loops in scientific and engineering applications. This paper proposes a novel stencil algorithm using vector outer products. Unlike previous work, the new algorithm arises from the stencil definition in the scatter mode and is initially expressed with formulas of vector outer products. The implementation incorporates a set of optimizations to improve the memory reference pattern, execution pipeline and data reuse by considering various algorithmic options and the data sharing between input vectors. Evaluation on a simulator shows that our design achieves a substantial speedup compared with vectorized stencil algorithm

    Mesh adaptation on the sphere using optimal transport and the numerical solution of a Monge-Ampère type equation

    Get PDF
    An equation of Monge-Ampère type has, for the first time, been solved numerically on the surface of the sphere in order to generate optimally transported (OT) meshes, equidistributed with respect to a monitor function. Optimal transport generates meshes that keep the same connectivity as the original mesh, making them suitable for r-adaptive simulations, in which the equations of motion can be solved in a moving frame of reference in order to avoid mapping the solution between old and new meshes and to avoid load balancing problems on parallel computers. The semi-implicit solution of the Monge-Ampère type equation involves a new linearisation of the Hessian term, and exponential maps are used to map from old to new meshes on the sphere. The determinant of the Hessian is evaluated as the change in volume between old and new mesh cells, rather than using numerical approximations to the gradients. OT meshes are generated to compare with centroidal Voronoi tesselations on the sphere and are found to have advantages and disadvantages; OT equidistribution is more accurate, the number of iterations to convergence is independent of the mesh size, face skewness is reduced and the connectivity does not change. However anisotropy is higher and the OT meshes are non-orthogonal. It is shown that optimal transport on the sphere leads to meshes that do not tangle. However, tangling can be introduced by numerical errors in calculating the gradient of the mesh potential. Methods for alleviating this problem are explored. Finally, OT meshes are generated using observed precipitation as a monitor function, in order to demonstrate the potential power of the technique
    corecore