Search CORE

2,959 research outputs found

Air pollution modelling using a graphics processing unit with CUDA

Author: Lagzi Istvan
Meszaros Robert
Molnar Jr. Ferenc
Szakaly Tamas
Publication venue: 'Elsevier BV'
Publication date: 16/12/2009
Field of study

The Graphics Processing Unit (GPU) is a powerful tool for parallel computing. In the past years the performance and capabilities of GPUs have increased, and the Compute Unified Device Architecture (CUDA) - a parallel computing architecture - has been developed by NVIDIA to utilize this performance in general purpose computations. Here we show for the first time a possible application of GPU for environmental studies serving as a basement for decision making strategies. A stochastic Lagrangian particle model has been developed on CUDA to estimate the transport and the transformation of the radionuclides from a single point source during an accidental release. Our results show that parallel implementation achieves typical acceleration values in the order of 80-120 times compared to CPU using a single-threaded implementation on a 2.33 GHz desktop computer. Only very small differences have been found between the results obtained from GPU and CPU simulations, which are comparable with the effect of stochastic transport phenomena in atmosphere. The relatively high speedup with no additional costs to maintain this parallel architecture could result in a wide usage of GPU for diversified environmental applications in the near future.Comment: 5 figure

arXiv.org e-Print Archive

ELTE Digital Institutional Repository (EDIT)

Simulation of reaction-diffusion processes in three dimensions using CUDA

Author: Alexandrov
Anderson
Belleman
Block
Buluc
Castano-Diez
Castets
Che
Costello
Cross
Dabdub
Epstein
Ferenc Izsák
Ferenc Molnár
Ford
Fowler
Garland
Gutiérrez
Horváth
Horváth
Horváth
Huang
István Lagzi
Januszewski
Komatitsch
Komatitsch
Lagzi
Lagzi
Lagzi
Lagzi
Lengyel
Li
Liu
Liu
Lovas
Martin
Melchionna
Micikevicius
Molnár
Nakamasu
NVIDIA Corporation
Owens
Preis
Pápai
Rácz
Róbert Mészáros
Sainio
Sanderson
Sanna
Sanna
Schmidt
Senocak
Shoji
Shoji
Simek
Stone
Stone
Sultan
Volford
Volford
Walsh
Publication venue: 'Elsevier BV'
Publication date: 03/04/2010
Field of study

Numerical solution of reaction-diffusion equations in three dimensions is one of the most challenging applied mathematical problems. Since these simulations are very time consuming, any ideas and strategies aiming at the reduction of CPU time are important topics of research. A general and robust idea is the parallelization of source codes/programs. Recently, the technological development of graphics hardware created a possibility to use desktop video cards to solve numerically intensive problems. We present a powerful parallel computing framework to solve reaction-diffusion equations numerically using the Graphics Processing Units (GPUs) with CUDA. Four different reaction-diffusion problems, (i) diffusion of chemically inert compound, (ii) Turing pattern formation, (iii) phase separation in the wake of a moving diffusion front and (iv) air pollution dispersion were solved, and additionally both the Shared method and the Moving Tiles method were tested. Our results show that parallel implementation achieves typical acceleration values in the order of 5-40 times compared to CPU using a single-threaded implementation on a 2.8 GHz desktop computer.Comment: 8 figures, 5 table

arXiv.org e-Print Archive

Crossref

University of Twente Research Information

Architecture-Aware Optimization on a 1600-core Graphics Processor

Author: Daga Mayank
Feng Wu-chun
Scogland Thomas R.W.
Publication venue
Publication date: 01/07/2011
Field of study

The graphics processing unit (GPU) continues to make significant strides as an accelerator in commodity cluster computing for high-performance computing (HPC). For example, three of the top five fastest supercomputers in the world, as ranked by the TOP500, employ GPUs as accelerators. Despite this increasing interest in GPUs, however, optimizing the performance of a GPU-accelerated compute node requires deep technical knowledge of the underlying architecture. Although significant literature exists on how to optimize GPU performance on the more mature NVIDIA CUDA architecture, the converse is true for OpenCL on the AMD GPU. Consequently, we present and evaluate architecture-aware optimizations for the AMD GPU. The most prominent optimizations include (i) explicit use of registers, (ii) use of vector types, (iii) removal of branches, and (iv) use of image memory for global data. We demonstrate the efficacy of our AMD GPU optimizations by applying each optimization in isolation as well as in concert to a large-scale, molecular modeling application called GEM. Via these AMD-specific GPU optimizations, the AMD Radeon HD 5870 GPU delivers 65% better performance than with the wellknown NVIDIA-specific optimizations

Computer Science Technical Reports @Virginia Tech

Acceleration of Coarse Grain Molecular Dynamics on GPU Architectures

Author: Anderson
Bauer
Berendsen
Brown
Brown
Colberg
Dullweber
Friedrichs
Ganesan
Gay
Harvey
Högberg
Liu
Liu
MacCallum
Mourtisen
Müller
Nguyen
Orsi
Orsi
Orsi
Orsi
Orsi
Orsi
Orsi
Orsi
Orsi
Phillips
Plimpton
Rapaport
Rapaport
Schmid
Stone
Stone
Stone
Sunarso
van Meel
Wang
Wohlert
Zhmurov
Publication venue: John Wiley & Sons Limited:1 Oldlands Way, Bognor Regis, P022 9SA United Kingdom:011 44 1243 779777, EMAIL: [email protected], INTERNET: http://www.wiley.co.uk, Fax: 011 44 1243 843232
Publication date: 01/01/2013
Field of study

Coarse grain (CG) molecular models have been proposed to simulate complex sys- tems with lower computational overheads and longer timescales with respect to atom- istic level models. However, their acceleration on parallel architectures such as Graphic Processing Units (GPU) presents original challenges that must be carefully evaluated. The objective of this work is to characterize the impact of CG model features on parallel simulation performance. To achieve this, we implemented a GPU-accelerated version of a CG molecular dynamics simulator, to which we applied specic optimizations for CG models, such as dedicated data structures to handle dierent bead type interac- tions, obtaining a maximum speed-up of 14 on the NVIDIA GTX480 GPU with Fermi architecture. We provide a complete characterization and evaluation of algorithmic and simulated system features of CG models impacting the achievable speed-up and accuracy of results, using three dierent GPU architectures as case studie

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

PORTO Publications Open Repository TOrino

Highly accelerated simulations of glassy dynamics using GPUs: caveats on limited floating-point precision

Author: Anderson
Block
Dekker
Felix Höfling
Flenner
Frenkel
Götze
Hansen
Harvey
Knuth
Knuth
Kob
Kob
Kob
Lippert
Liu
Mosayebi
Peter H. Colberg
Plimpton
Preis
Rapaport
Sagan
Stone
van Meel
Voelz
Weeks
Xu
Yang
Zagha
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

Modern graphics processing units (GPUs) provide impressive computing resources, which can be accessed conveniently through the CUDA programming interface. We describe how GPUs can be used to considerably speed up molecular dynamics (MD) simulations for system sizes ranging up to about 1 million particles. Particular emphasis is put on the numerical long-time stability in terms of energy and momentum conservation, and caveats on limited floating-point precision are issued. Strict energy conservation over 10^8 MD steps is obtained by double-single emulation of the floating-point arithmetic in accuracy-critical parts of the algorithm. For the slow dynamics of a supercooled binary Lennard-Jones mixture, we demonstrate that the use of single-floating point precision may result in quantitatively and even physically wrong results. For simulations of a Lennard-Jones fluid, the described implementation shows speedup factors of up to 80 compared to a serial implementation for the CPU, and a single GPU was found to compare with a parallelised MD simulation using 64 distributed cores.Comment: 12 pages, 7 figures, to appear in Comp. Phys. Comm., HALMD package licensed under the GPL, see http://research.colberg.org/projects/halm

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Crossref

Accelerating moderately stiff chemical kinetics in reactive-flow simulations using GPUs

Author: Niemeyer Kyle E
Sung Chih-Jen
Publication venue: 'Elsevier BV'
Publication date: 04/11/2013
Field of study

The chemical kinetics ODEs arising from operator-split reactive-flow simulations were solved on GPUs using explicit integration algorithms. Nonstiff chemical kinetics of a hydrogen oxidation mechanism (9 species and 38 irreversible reactions) were computed using the explicit fifth-order Runge-Kutta-Cash-Karp method, and the GPU-accelerated version performed faster than single- and six-core CPU versions by factors of 126 and 25, respectively, for 524,288 ODEs. Moderately stiff kinetics, represented with mechanisms for hydrogen/carbon-monoxide (13 species and 54 irreversible reactions) and methane (53 species and 634 irreversible reactions) oxidation, were computed using the stabilized explicit second-order Runge-Kutta-Chebyshev (RKC) algorithm. The GPU-based RKC implementation demonstrated an increase in performance of nearly 59 and 10 times, for problem sizes consisting of 262,144 ODEs and larger, than the single- and six-core CPU-based RKC algorithms using the hydrogen/carbon-monoxide mechanism. With the methane mechanism, RKC-GPU performed more than 65 and 11 times faster, for problem sizes consisting of 131,072 ODEs and larger, than the single- and six-core RKC-CPU versions, and up to 57 times faster than the six-core CPU-based implicit VODE algorithm on 65,536 ODEs. In the presence of more severe stiffness, such as ethylene oxidation (111 species and 1566 irreversible reactions), RKC-GPU performed more than 17 times faster than RKC-CPU on six cores for 32,768 ODEs and larger, and at best 4.5 times faster than VODE on six CPU cores for 65,536 ODEs. With a larger time step size, RKC-GPU performed at best 2.5 times slower than six-core VODE for 8192 ODEs and larger. Therefore, the need for developing new strategies for integrating stiff chemistry on GPUs was discussed.Comment: 27 pages, LaTeX; corrected typos in Appendix equations A.10 and A.1

arXiv.org e-Print Archive

CiteSeerX