Search CORE

2,723 research outputs found

Automated Translation and Accelerated Solving of Differential Equations on Multiple GPU Platforms

Author: Besard Tim
Churavy Valentin
Edelman Alan
Gerlach Adam R.
Gymnich Tim
Ma Yingbo
Rackauckas Christopher
Utkarsh Utkarsh
Publication venue
Publication date: 13/04/2023
Field of study

We demonstrate a high-performance vendor-agnostic method for massively parallel solving of ensembles of ordinary differential equations (ODEs) and stochastic differential equations (SDEs) on GPUs. The method is integrated with a widely used differential equation solver library in a high-level language (Julia's DifferentialEquations.jl) and enables GPU acceleration without requiring code changes by the user. Our approach achieves state-of-the-art performance compared to hand-optimized CUDA-C++ kernels, while performing

20-100\times

faster than the vectorized-map (\texttt{vmap}) approach implemented in JAX and PyTorch. Performance evaluation on NVIDIA, AMD, Intel, and Apple GPUs demonstrates performance portability and vendor-agnosticism. We show composability with MPI to enable distributed multi-GPU workflows. The implemented solvers are fully featured, supporting event handling, automatic differentiation, and incorporating of datasets via the GPU's texture memory, allowing scientists to take advantage of GPU acceleration on all major current architectures without changing their model code and without loss of performance.Comment: 11 figure

arXiv.org e-Print Archive

GPU Accelerated Explicit Time Integration Methods for Electro-Quasistatic Fields

Author: Clemens Markus
Richter Christian
Schöps Sebastian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/12/2016
Field of study

Electro-quasistatic field problems involving nonlinear materials are commonly discretized in space using finite elements. In this paper, it is proposed to solve the resulting system of ordinary differential equations by an explicit Runge-Kutta-Chebyshev time-integration scheme. This mitigates the need for Newton-Raphson iterations, as they are necessary within fully implicit time integration schemes. However, the electro-quasistatic system of ordinary differential equations has a Laplace-type mass matrix such that parts of the explicit time-integration scheme remain implicit. An iterative solver with constant preconditioner is shown to efficiently solve the resulting multiple right-hand side problem. This approach allows an efficient parallel implementation on a system featuring multiple graphic processing units.Comment: 4 pages, 5 figure

arXiv.org e-Print Archive

TUbiblio

Odeint - Solving ordinary differential equations in C++

Author: Ahnert Karsten
Mulansky Mario
Publication venue: 'AIP Publishing'
Publication date: 01/01/2011
Field of study

Many physical, biological or chemical systems are modeled by ordinary differential equations (ODEs) and finding their solution is an every-day-task for many scientists. Here, we introduce a new C++ library dedicated to find numerical solutions of initial value problems of ODEs: odeint (www.odeint.com). odeint is implemented in a highly generic way and provides extensive interoperability at top performance. For example, due to it's modular design it can be easily parallized with OpenMP and even runs on CUDA GPUs. Despite that, it provides a convenient interface that allows for a simple and easy usage.Comment: 4 pages, 1 figur

arXiv.org e-Print Archive

Crossref

Exponential Integrators on Graphic Processing Units

Author: Einkemmer Lukas
Ostermann Alexander
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/09/2013
Field of study

In this paper we revisit stencil methods on GPUs in the context of exponential integrators. We further discuss boundary conditions, in the same context, and show that simple boundary conditions (for example, homogeneous Dirichlet or homogeneous Neumann boundary conditions) do not affect the performance if implemented directly into the CUDA kernel. In addition, we show that stencil methods with position-dependent coefficients can be implemented efficiently as well. As an application, we discuss the implementation of exponential integrators for different classes of problems in a single and multi GPU setup (up to 4 GPUs). We further show that for stencil based methods such parallelization can be done very efficiently, while for some unstructured matrices the parallelization to multiple GPUs is severely limited by the throughput of the PCIe bus.Comment: To appear in: Proceedings of the 2013 International Conference on High Performance Computing Simulation (HPCS 2013), IEEE (2013

arXiv.org e-Print Archive

Crossref

Accelerating moderately stiff chemical kinetics in reactive-flow simulations using GPUs

Author: Niemeyer Kyle E
Sung Chih-Jen
Publication venue: 'Elsevier BV'
Publication date: 04/11/2013
Field of study

The chemical kinetics ODEs arising from operator-split reactive-flow simulations were solved on GPUs using explicit integration algorithms. Nonstiff chemical kinetics of a hydrogen oxidation mechanism (9 species and 38 irreversible reactions) were computed using the explicit fifth-order Runge-Kutta-Cash-Karp method, and the GPU-accelerated version performed faster than single- and six-core CPU versions by factors of 126 and 25, respectively, for 524,288 ODEs. Moderately stiff kinetics, represented with mechanisms for hydrogen/carbon-monoxide (13 species and 54 irreversible reactions) and methane (53 species and 634 irreversible reactions) oxidation, were computed using the stabilized explicit second-order Runge-Kutta-Chebyshev (RKC) algorithm. The GPU-based RKC implementation demonstrated an increase in performance of nearly 59 and 10 times, for problem sizes consisting of 262,144 ODEs and larger, than the single- and six-core CPU-based RKC algorithms using the hydrogen/carbon-monoxide mechanism. With the methane mechanism, RKC-GPU performed more than 65 and 11 times faster, for problem sizes consisting of 131,072 ODEs and larger, than the single- and six-core RKC-CPU versions, and up to 57 times faster than the six-core CPU-based implicit VODE algorithm on 65,536 ODEs. In the presence of more severe stiffness, such as ethylene oxidation (111 species and 1566 irreversible reactions), RKC-GPU performed more than 17 times faster than RKC-CPU on six cores for 32,768 ODEs and larger, and at best 4.5 times faster than VODE on six CPU cores for 65,536 ODEs. With a larger time step size, RKC-GPU performed at best 2.5 times slower than six-core VODE for 8192 ODEs and larger. Therefore, the need for developing new strategies for integrating stiff chemistry on GPUs was discussed.Comment: 27 pages, LaTeX; corrected typos in Appendix equations A.10 and A.1

arXiv.org e-Print Archive

CiteSeerX