Search CORE

811 research outputs found

Accelerating crystal plasticity simulations using GPU multiprocessors

Author: Hallberg Håkan
Mellbin Ylva
Ristinmaa Matti
Publication venue: 'Wiley'
Publication date: 01/01/2014
Field of study

Crystal plasticity models are often used to model the deformation behavior of polycrystalline materials. One major drawback with such models is that they are computationally very demanding. Adopting the common Taylor assumption requires calculation of the response of several hundreds of individual grains to obtain the stress in a single integration point in the overlying FEM structure. However, a large part of the operations can be executed in parallel to reduce the computation time. One emerging technology for running massively parallel computations without having to rely on the availability of large computer clusters is to port the parallel parts of the calculations to a graphical processing unit (GPU). GPUs are designed to handle vast numbers of floating point operations in parallel. In the present work, different strategies for the numerical implementation of crystal plasticity are investigated as well as a number of approaches to parallelization of the program execution. It is identified that a major concern is the limited amount of memory available on the GPU. However, significant reductions in computational time – up to 100 times speedup – are achieved in the present study, and possible also on a standard desktop computer equipped with a GPU

Lund University Publications

Accelerating moderately stiff chemical kinetics in reactive-flow simulations using GPUs

Author: Niemeyer Kyle E
Sung Chih-Jen
Publication venue: 'Elsevier BV'
Publication date: 04/11/2013
Field of study

The chemical kinetics ODEs arising from operator-split reactive-flow simulations were solved on GPUs using explicit integration algorithms. Nonstiff chemical kinetics of a hydrogen oxidation mechanism (9 species and 38 irreversible reactions) were computed using the explicit fifth-order Runge-Kutta-Cash-Karp method, and the GPU-accelerated version performed faster than single- and six-core CPU versions by factors of 126 and 25, respectively, for 524,288 ODEs. Moderately stiff kinetics, represented with mechanisms for hydrogen/carbon-monoxide (13 species and 54 irreversible reactions) and methane (53 species and 634 irreversible reactions) oxidation, were computed using the stabilized explicit second-order Runge-Kutta-Chebyshev (RKC) algorithm. The GPU-based RKC implementation demonstrated an increase in performance of nearly 59 and 10 times, for problem sizes consisting of 262,144 ODEs and larger, than the single- and six-core CPU-based RKC algorithms using the hydrogen/carbon-monoxide mechanism. With the methane mechanism, RKC-GPU performed more than 65 and 11 times faster, for problem sizes consisting of 131,072 ODEs and larger, than the single- and six-core RKC-CPU versions, and up to 57 times faster than the six-core CPU-based implicit VODE algorithm on 65,536 ODEs. In the presence of more severe stiffness, such as ethylene oxidation (111 species and 1566 irreversible reactions), RKC-GPU performed more than 17 times faster than RKC-CPU on six cores for 32,768 ODEs and larger, and at best 4.5 times faster than VODE on six CPU cores for 65,536 ODEs. With a larger time step size, RKC-GPU performed at best 2.5 times slower than six-core VODE for 8192 ODEs and larger. Therefore, the need for developing new strategies for integrating stiff chemistry on GPUs was discussed.Comment: 27 pages, LaTeX; corrected typos in Appendix equations A.10 and A.1

arXiv.org e-Print Archive

CiteSeerX

FiCoS: A fine-grained and coarse-grained GPU-powered deterministic simulator for biochemical networks.

Author: Besozzi Daniela
Capitoli Giulia
Cazzaniga Paolo
Mauri Giancarlo
Nobile Marco S
Rundo Leonardo
Spolaor Simone
Tangherloni Andrea
Publication venue: PLoS Comput Biol
Publication date: 01/01/2021
Field of study

Mathematical models of biochemical networks can largely facilitate the comprehension of the mechanisms at the basis of cellular processes, as well as the formulation of hypotheses that can be tested by means of targeted laboratory experiments. However, two issues might hamper the achievement of fruitful outcomes. On the one hand, detailed mechanistic models can involve hundreds or thousands of molecular species and their intermediate complexes, as well as hundreds or thousands of chemical reactions, a situation generally occurring in rule-based modeling. On the other hand, the computational analysis of a model typically requires the execution of a large number of simulations for its calibration, or to test the effect of perturbations. As a consequence, the computational capabilities of modern Central Processing Units can be easily overtaken, possibly making the modeling of biochemical networks a worthless or ineffective effort. To the aim of overcoming the limitations of the current state-of-the-art simulation approaches, we present in this paper FiCoS, a novel "black-box" deterministic simulator that effectively realizes both a fine-grained and a coarse-grained parallelization on Graphics Processing Units. In particular, FiCoS exploits two different integration methods, namely, the Dormand-Prince and the Radau IIA, to efficiently solve both non-stiff and stiff systems of coupled Ordinary Differential Equations. We tested the performance of FiCoS against different deterministic simulators, by considering models of increasing size and by running analyses with increasing computational demands. FiCoS was able to dramatically speedup the computations up to 855×, showing to be a promising solution for the simulation and analysis of large-scale models of complex biological processes

Archivio istituzionale della Ricerca - Bocconi

Pure OAI Repository

PubMed Central

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Apollo (Cambridge)

Parallel-In-Time Simulation of Eddy Current Problems Using Parareal

Author: Clemens Markus
Niyonzima Innocent
Schöps Sebastian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2017
Field of study

In this contribution the usage of the Parareal method is proposed for the time-parallel solution of the eddy current problem. The method is adapted to the particular challenges of the problem that are related to the differential algebraic character due to non-conducting regions. It is shown how the necessary modification can be automatically incorporated by using a suitable time stepping method. The paper closes with a first demonstration of a simulation of a realistic four-pole induction machine model using Parareal

arXiv.org e-Print Archive

TUbiblio

HAL Descartes

Hal-Diderot

An Overview of Variational Integrators

Author: Adrian Lew
Jerrold E. Marsden
L. P. Franca (ed
Matthew West
Michael Ortiz
Publication venue: International Center for Numerical Methods in Engineering (CIMNE)
Publication date: 01/01/2003
Field of study

The purpose of this paper is to survey some recent advances in variational integrators for both finite dimensional mechanical systems as well as continuum mechanics. These advances include the general development of discrete mechanics, applications to dissipative systems, collisions, spacetime integration algorithms, AVI’s (Asynchronous Variational Integrators), as well as reduction for discrete mechanical systems. To keep the article within the set limits, we will only treat each topic briefly and will not attempt to develop any particular topic in any depth. We hope, nonetheless, that this paper serves as a useful guide to the literature as well as to future directions and open problems in the subject

CiteSeerX

Caltech Authors

Adaptive Mesh Fluid Simulations on GPU

Author: Abel Tom
Kaehler Ralf
Wang Peng
Publication venue: 'Elsevier BV'
Publication date: 28/10/2009
Field of study

We describe an implementation of compressible inviscid fluid solvers with block-structured adaptive mesh refinement on Graphics Processing Units using NVIDIA's CUDA. We show that a class of high resolution shock capturing schemes can be mapped naturally on this architecture. Using the method of lines approach with the second order total variation diminishing Runge-Kutta time integration scheme, piecewise linear reconstruction, and a Harten-Lax-van Leer Riemann solver, we achieve an overall speedup of approximately 10 times faster execution on one graphics card as compared to a single core on the host computer. We attain this speedup in uniform grid runs as well as in problems with deep AMR hierarchies. Our framework can readily be applied to more general systems of conservation laws and extended to higher order shock capturing schemes. This is shown directly by an implementation of a magneto-hydrodynamic solver and comparing its performance to the pure hydrodynamic case. Finally, we also combined our CUDA parallel scheme with MPI to make the code run on GPU clusters. Close to ideal speedup is observed on up to four GPUs.Comment: Submitted to New Astronom

arXiv.org e-Print Archive

CiteSeerX

Additional degrees of parallelism within the Adomian decomposition method

Author: D Gottlieb
D Kaya
D Kaya
E Hopf
G Adomian
G Adomian
G Adomian
G Adomian
G Adomian
H Bateman
H Zhu
I Akpan
J Burgers
JC Butcher
JC Butcher
JD Cole
K Abbaoui
M Cheng
MA El-Tawil
N Shawagfeh
P Vadasz
R Courant
S Abbasbandy
S Guellal
SM El-Sayed
SR Barros
UM Ascher
WH Press
Y Cherruault
Y Jiao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

4th International Conference on Computational Engineering (ICCE 2017), 28-29 September 2017, DarmstadtThis is the author accepted manuscript. The final version is available from Springer via the DOI in this record.The trend of future massively parallel computer architectures challenges the exploration of additional degrees of parallelism also in the time dimension when solving continuum mechanical partial differential equations. The Adomian decomposition method (ADM) is investigated to this respects in the present work. This is accomplished by comparison with the Runge-Kutta (RK) time integration and put in the context of the viscous Burgers equation. Our studies show that both methods have similar restrictions regarding their maximal time step size. Increasing the order of the schemes leads to larger errors for the ADM compared to RK. However, we also discuss a parallelization within the ADM, reducing its runtime complexity from O(n^2) to O(n). This indicates the possibility to make it a viable competitor to RK, as fewer function evaluations have to be done in serial, if a high order method is desired. Additionally, creating ADM schemes of high-order is less complex as it is with RK.The work of Andreas Schmitt is supported by the ’Excellence Initiative’ of the German Federal and State Governments and the Graduate School of Computational Engineering at Technische Universit¨at Darmstadt

TUbiblio

Crossref

Open Research Exeter

A matrix-free high-order discontinuous Galerkin compressible Navier-Stokes solver: A performance comparison of compressible and incompressible formulations for turbulent incompressible flows

Author: Arndt
Arnold
Bassi
Beck
Beck
Brown
Cantwell
Carton de Wiart
Carton de Wiart
Chapelier
Del Alamo
Fehn
Fehn
Fehn
Fernandez
Fischer
Flad
Flad
Franciolini
Gassner
Guermond
Hager
Hartmann
Hesthaven
Hillewaert
Hindenlang
Joshi
Karniadakis
Kennedy
Kirby
Kopriva
Krank
Krank
Kronbichler
Kronbichler
Kronbichler
Kronbichler
Kubatko
Mengaldo
Moser
Moura
Orszag
Pazner
Steinmoeller
Taylor
Toulorge
Uranga
Vos
Wang
Publication venue: 'Wiley'
Publication date: 01/01/2018
Field of study

Both compressible and incompressible Navier-Stokes solvers can be used and are used to solve incompressible turbulent flow problems. In the compressible case, the Mach number is then considered as a solver parameter that is set to a small value,

\mathrm{M}\approx 0.1

, in order to mimic incompressible flows. This strategy is widely used for high-order discontinuous Galerkin discretizations of the compressible Navier-Stokes equations. The present work raises the question regarding the computational efficiency of compressible DG solvers as compared to a genuinely incompressible formulation. Our contributions to the state-of-the-art are twofold: Firstly, we present a high-performance discontinuous Galerkin solver for the compressible Navier-Stokes equations based on a highly efficient matrix-free implementation that targets modern cache-based multicore architectures. The performance results presented in this work focus on the node-level performance and our results suggest that there is great potential for further performance improvements for current state-of-the-art discontinuous Galerkin implementations of the compressible Navier-Stokes equations. Secondly, this compressible Navier-Stokes solver is put into perspective by comparing it to an incompressible DG solver that uses the same matrix-free implementation. We discuss algorithmic differences between both solution strategies and present an in-depth numerical investigation of the performance. The considered benchmark test cases are the three-dimensional Taylor-Green vortex problem as a representative of transitional flows and the turbulent channel flow problem as a representative of wall-bounded turbulent flows

arXiv.org e-Print Archive

OPUS Augsburg

Crossref