290 research outputs found
A GPU-based Transient Stability Simulation using Runge-Kutta Integration Algorithm
Graphics processing units (GPU) have been investigated to release the computational capability in various scientific applications. Recent research shows that prudential consideration needs to be given to take the advantages of GPUs while avoiding the deficiency. In this paper, the impact of GPU acceleration to implicit integrators and explicit integrators in transient stability is investigated. It is illustrated that implicit integrators, although more numerical stable than explicit ones, are not suitable for GPU acceleration. As a tradeoff between numerical stability and efficiency, an explicit 4th order Runge-Kutta integration algorithm is implemented for transient stability simulation based on hybrid CPU-GPU architecture. The differential equations of dynamic components are evaluated in GPU, while the linear network equations are solved in CPU using sparse direct solver. Simulation on IEEE 22-bus power system with 6 generators is reported to validate the feasibility of the proposed method.published_or_final_versio
Swarm-NG: a CUDA Library for Parallel n-body Integrations with focus on Simulations of Planetary Systems
We present Swarm-NG, a C++ library for the efficient direct integration of
many n-body systems using highly-parallel Graphics Processing Unit (GPU), such
as NVIDIA's Tesla T10 and M2070 GPUs. While previous studies have demonstrated
the benefit of GPUs for n-body simulations with thousands to millions of
bodies, Swarm-NG focuses on many few-body systems, e.g., thousands of systems
with 3...15 bodies each, as is typical for the study of planetary systems.
Swarm-NG parallelizes the simulation, including both the numerical integration
of the equations of motion and the evaluation of forces using NVIDIA's "Compute
Unified Device Architecture" (CUDA) on the GPU. Swarm-NG includes optimized
implementations of 4th order time-symmetrized Hermite integration and mixed
variable symplectic integration, as well as several sample codes for other
algorithms to illustrate how non-CUDA-savvy users may themselves introduce
customized integrators into the Swarm-NG framework. To optimize performance, we
analyze the effect of GPU-specific parameters on performance under double
precision.
Applications of Swarm-NG include studying the late stages of planet
formation, testing the stability of planetary systems and evaluating the
goodness-of-fit between many planetary system models and observations of
extrasolar planet host stars (e.g., radial velocity, astrometry, transit
timing). While Swarm-NG focuses on the parallel integration of many planetary
systems,the underlying integrators could be applied to a wide variety of
problems that require repeatedly integrating a set of ordinary differential
equations many times using different initial conditions and/or parameter
values.Comment: Submitted to New Astronom
Accelerating moderately stiff chemical kinetics in reactive-flow simulations using GPUs
The chemical kinetics ODEs arising from operator-split reactive-flow
simulations were solved on GPUs using explicit integration algorithms. Nonstiff
chemical kinetics of a hydrogen oxidation mechanism (9 species and 38
irreversible reactions) were computed using the explicit fifth-order
Runge-Kutta-Cash-Karp method, and the GPU-accelerated version performed faster
than single- and six-core CPU versions by factors of 126 and 25, respectively,
for 524,288 ODEs. Moderately stiff kinetics, represented with mechanisms for
hydrogen/carbon-monoxide (13 species and 54 irreversible reactions) and methane
(53 species and 634 irreversible reactions) oxidation, were computed using the
stabilized explicit second-order Runge-Kutta-Chebyshev (RKC) algorithm. The
GPU-based RKC implementation demonstrated an increase in performance of nearly
59 and 10 times, for problem sizes consisting of 262,144 ODEs and larger, than
the single- and six-core CPU-based RKC algorithms using the
hydrogen/carbon-monoxide mechanism. With the methane mechanism, RKC-GPU
performed more than 65 and 11 times faster, for problem sizes consisting of
131,072 ODEs and larger, than the single- and six-core RKC-CPU versions, and up
to 57 times faster than the six-core CPU-based implicit VODE algorithm on
65,536 ODEs. In the presence of more severe stiffness, such as ethylene
oxidation (111 species and 1566 irreversible reactions), RKC-GPU performed more
than 17 times faster than RKC-CPU on six cores for 32,768 ODEs and larger, and
at best 4.5 times faster than VODE on six CPU cores for 65,536 ODEs. With a
larger time step size, RKC-GPU performed at best 2.5 times slower than six-core
VODE for 8192 ODEs and larger. Therefore, the need for developing new
strategies for integrating stiff chemistry on GPUs was discussed.Comment: 27 pages, LaTeX; corrected typos in Appendix equations A.10 and A.1
Exponential Integrators on Graphic Processing Units
In this paper we revisit stencil methods on GPUs in the context of
exponential integrators. We further discuss boundary conditions, in the same
context, and show that simple boundary conditions (for example, homogeneous
Dirichlet or homogeneous Neumann boundary conditions) do not affect the
performance if implemented directly into the CUDA kernel. In addition, we show
that stencil methods with position-dependent coefficients can be implemented
efficiently as well.
As an application, we discuss the implementation of exponential integrators
for different classes of problems in a single and multi GPU setup (up to 4
GPUs). We further show that for stencil based methods such parallelization can
be done very efficiently, while for some unstructured matrices the
parallelization to multiple GPUs is severely limited by the throughput of the
PCIe bus.Comment: To appear in: Proceedings of the 2013 International Conference on
High Performance Computing Simulation (HPCS 2013), IEEE (2013
GPU Accelerated Explicit Time Integration Methods for Electro-Quasistatic Fields
Electro-quasistatic field problems involving nonlinear materials are commonly
discretized in space using finite elements. In this paper, it is proposed to
solve the resulting system of ordinary differential equations by an explicit
Runge-Kutta-Chebyshev time-integration scheme. This mitigates the need for
Newton-Raphson iterations, as they are necessary within fully implicit time
integration schemes. However, the electro-quasistatic system of ordinary
differential equations has a Laplace-type mass matrix such that parts of the
explicit time-integration scheme remain implicit. An iterative solver with
constant preconditioner is shown to efficiently solve the resulting multiple
right-hand side problem. This approach allows an efficient parallel
implementation on a system featuring multiple graphic processing units.Comment: 4 pages, 5 figure
A GPU-Based Transient Stability Simulation Using Runge-Kutta Integration Algorithm
Abstract Graphics processing units (GPU) have been investigated to release the computational capability in various scientific applications. Recent research shows that prudential consideration needs to be given to take the advantages of GPUs while avoiding the deficiency. In this paper, the impact of GPU acceleration to implicit integrators and explicit integrators in transient stability is investigated. It is illustrated that implicit integrators, although more numerical stable than explicit ones, are not suitable for GPU acceleration. As a tradeoff between numerical stability and efficiency, an explicit 4th order Runge-Kutta integration algorithm is implemented for transient stability simulation based on hybrid CPU-GPU architecture. The differential equations of dynamic components are evaluated in GPU, while the linear network equations are solved in CPU using sparse direct solver. Simulation on IEEE 22-bus power system with 6 generators is reported to validate the feasibility of the proposed method
NLSEmagic: Nonlinear Schr\"odinger Equation Multidimensional Matlab-based GPU-accelerated Integrators using Compact High-order Schemes
We present a simple to use, yet powerful code package called NLSEmagic to
numerically integrate the nonlinear Schr\"odinger equation in one, two, and
three dimensions. NLSEmagic is a high-order finite-difference code package
which utilizes graphic processing unit (GPU) parallel architectures. The codes
running on the GPU are many times faster than their serial counterparts, and
are much cheaper to run than on standard parallel clusters. The codes are
developed with usability and portability in mind, and therefore are written to
interface with MATLAB utilizing custom GPU-enabled C codes with the
MEX-compiler interface. The packages are freely distributed, including user
manuals and set-up files.Comment: 37 pages, 13 figure
- …