811 research outputs found
Accelerating crystal plasticity simulations using GPU multiprocessors
Crystal plasticity models are often used to model the deformation behavior of polycrystalline materials. One major drawback with such models is that they are computationally very demanding. Adopting the common Taylor assumption requires calculation of the response of several hundreds of individual grains to obtain the stress in a single integration point in the overlying FEM structure. However, a large part of the operations can be executed in parallel to reduce the computation time. One emerging technology for running massively parallel computations without having to rely on the availability of large computer clusters is to port the parallel parts of the calculations to a graphical processing unit (GPU). GPUs are designed to handle vast numbers of floating point operations in parallel. In the present work, different strategies for the numerical implementation of crystal plasticity are investigated as well as a number of approaches to parallelization of the program execution. It is identified that a major concern is the limited amount of memory available on the GPU. However, significant reductions in computational time – up to 100 times speedup – are achieved in the present study, and possible also on a standard desktop computer equipped with a GPU
Accelerating moderately stiff chemical kinetics in reactive-flow simulations using GPUs
The chemical kinetics ODEs arising from operator-split reactive-flow
simulations were solved on GPUs using explicit integration algorithms. Nonstiff
chemical kinetics of a hydrogen oxidation mechanism (9 species and 38
irreversible reactions) were computed using the explicit fifth-order
Runge-Kutta-Cash-Karp method, and the GPU-accelerated version performed faster
than single- and six-core CPU versions by factors of 126 and 25, respectively,
for 524,288 ODEs. Moderately stiff kinetics, represented with mechanisms for
hydrogen/carbon-monoxide (13 species and 54 irreversible reactions) and methane
(53 species and 634 irreversible reactions) oxidation, were computed using the
stabilized explicit second-order Runge-Kutta-Chebyshev (RKC) algorithm. The
GPU-based RKC implementation demonstrated an increase in performance of nearly
59 and 10 times, for problem sizes consisting of 262,144 ODEs and larger, than
the single- and six-core CPU-based RKC algorithms using the
hydrogen/carbon-monoxide mechanism. With the methane mechanism, RKC-GPU
performed more than 65 and 11 times faster, for problem sizes consisting of
131,072 ODEs and larger, than the single- and six-core RKC-CPU versions, and up
to 57 times faster than the six-core CPU-based implicit VODE algorithm on
65,536 ODEs. In the presence of more severe stiffness, such as ethylene
oxidation (111 species and 1566 irreversible reactions), RKC-GPU performed more
than 17 times faster than RKC-CPU on six cores for 32,768 ODEs and larger, and
at best 4.5 times faster than VODE on six CPU cores for 65,536 ODEs. With a
larger time step size, RKC-GPU performed at best 2.5 times slower than six-core
VODE for 8192 ODEs and larger. Therefore, the need for developing new
strategies for integrating stiff chemistry on GPUs was discussed.Comment: 27 pages, LaTeX; corrected typos in Appendix equations A.10 and A.1
FiCoS: A fine-grained and coarse-grained GPU-powered deterministic simulator for biochemical networks.
Mathematical models of biochemical networks can largely facilitate the comprehension of the mechanisms at the basis of cellular processes, as well as the formulation of hypotheses that can be tested by means of targeted laboratory experiments. However, two issues might hamper the achievement of fruitful outcomes. On the one hand, detailed mechanistic models can involve hundreds or thousands of molecular species and their intermediate complexes, as well as hundreds or thousands of chemical reactions, a situation generally occurring in rule-based modeling. On the other hand, the computational analysis of a model typically requires the execution of a large number of simulations for its calibration, or to test the effect of perturbations. As a consequence, the computational capabilities of modern Central Processing Units can be easily overtaken, possibly making the modeling of biochemical networks a worthless or ineffective effort. To the aim of overcoming the limitations of the current state-of-the-art simulation approaches, we present in this paper FiCoS, a novel "black-box" deterministic simulator that effectively realizes both a fine-grained and a coarse-grained parallelization on Graphics Processing Units. In particular, FiCoS exploits two different integration methods, namely, the Dormand-Prince and the Radau IIA, to efficiently solve both non-stiff and stiff systems of coupled Ordinary Differential Equations. We tested the performance of FiCoS against different deterministic simulators, by considering models of increasing size and by running analyses with increasing computational demands. FiCoS was able to dramatically speedup the computations up to 855×, showing to be a promising solution for the simulation and analysis of large-scale models of complex biological processes
Parallel-In-Time Simulation of Eddy Current Problems Using Parareal
In this contribution the usage of the Parareal method is proposed for the
time-parallel solution of the eddy current problem. The method is adapted to
the particular challenges of the problem that are related to the differential
algebraic character due to non-conducting regions. It is shown how the
necessary modification can be automatically incorporated by using a suitable
time stepping method. The paper closes with a first demonstration of a
simulation of a realistic four-pole induction machine model using Parareal
An Overview of Variational Integrators
The purpose of this paper is to survey some recent advances in variational
integrators for both finite dimensional mechanical systems as well as continuum
mechanics. These advances include the general development of discrete
mechanics, applications to dissipative systems, collisions, spacetime integration algorithms,
AVI’s (Asynchronous Variational Integrators), as well as reduction for
discrete mechanical systems. To keep the article within the set limits, we will only
treat each topic briefly and will not attempt to develop any particular topic in
any depth. We hope, nonetheless, that this paper serves as a useful guide to the
literature as well as to future directions and open problems in the subject
Adaptive Mesh Fluid Simulations on GPU
We describe an implementation of compressible inviscid fluid solvers with
block-structured adaptive mesh refinement on Graphics Processing Units using
NVIDIA's CUDA. We show that a class of high resolution shock capturing schemes
can be mapped naturally on this architecture. Using the method of lines
approach with the second order total variation diminishing Runge-Kutta time
integration scheme, piecewise linear reconstruction, and a Harten-Lax-van Leer
Riemann solver, we achieve an overall speedup of approximately 10 times faster
execution on one graphics card as compared to a single core on the host
computer. We attain this speedup in uniform grid runs as well as in problems
with deep AMR hierarchies. Our framework can readily be applied to more general
systems of conservation laws and extended to higher order shock capturing
schemes. This is shown directly by an implementation of a magneto-hydrodynamic
solver and comparing its performance to the pure hydrodynamic case. Finally, we
also combined our CUDA parallel scheme with MPI to make the code run on GPU
clusters. Close to ideal speedup is observed on up to four GPUs.Comment: Submitted to New Astronom
Additional degrees of parallelism within the Adomian decomposition method
4th International Conference on Computational Engineering (ICCE 2017), 28-29 September 2017, DarmstadtThis is the author accepted manuscript. The final version is available from Springer via the DOI in this record.The trend of future massively parallel computer architectures challenges the exploration of additional degrees of parallelism also in the time dimension when solving continuum mechanical partial differential equations. The Adomian decomposition method (ADM) is investigated to this respects in the present work. This is accomplished by comparison with the Runge-Kutta (RK) time integration and put in the context of the viscous Burgers equation. Our studies show that both methods have similar restrictions regarding their maximal time step size. Increasing the order of the schemes leads to larger errors for the ADM compared to RK. However, we also discuss a parallelization within the ADM, reducing its runtime complexity from O(n^2) to O(n). This indicates the possibility to make it a viable competitor to RK, as fewer function evaluations have to be done in serial, if a high order method is desired. Additionally, creating ADM schemes of high-order is less complex as it is with RK.The work of Andreas Schmitt is supported by the ’Excellence
Initiative’ of the German Federal and State Governments and the Graduate
School of Computational Engineering at Technische Universit¨at Darmstadt
A matrix-free high-order discontinuous Galerkin compressible Navier-Stokes solver: A performance comparison of compressible and incompressible formulations for turbulent incompressible flows
Both compressible and incompressible Navier-Stokes solvers can be used and
are used to solve incompressible turbulent flow problems. In the compressible
case, the Mach number is then considered as a solver parameter that is set to a
small value, , in order to mimic incompressible flows.
This strategy is widely used for high-order discontinuous Galerkin
discretizations of the compressible Navier-Stokes equations. The present work
raises the question regarding the computational efficiency of compressible DG
solvers as compared to a genuinely incompressible formulation. Our
contributions to the state-of-the-art are twofold: Firstly, we present a
high-performance discontinuous Galerkin solver for the compressible
Navier-Stokes equations based on a highly efficient matrix-free implementation
that targets modern cache-based multicore architectures. The performance
results presented in this work focus on the node-level performance and our
results suggest that there is great potential for further performance
improvements for current state-of-the-art discontinuous Galerkin
implementations of the compressible Navier-Stokes equations. Secondly, this
compressible Navier-Stokes solver is put into perspective by comparing it to an
incompressible DG solver that uses the same matrix-free implementation. We
discuss algorithmic differences between both solution strategies and present an
in-depth numerical investigation of the performance. The considered benchmark
test cases are the three-dimensional Taylor-Green vortex problem as a
representative of transitional flows and the turbulent channel flow problem as
a representative of wall-bounded turbulent flows
- …