Search CORE

4,974 research outputs found

Simulation of reaction-diffusion processes in three dimensions using CUDA

Author: Alexandrov
Anderson
Belleman
Block
Buluc
Castano-Diez
Castets
Che
Costello
Cross
Dabdub
Epstein
Ferenc Izsák
Ferenc Molnár
Ford
Fowler
Garland
Gutiérrez
Horváth
Horváth
Horváth
Huang
István Lagzi
Januszewski
Komatitsch
Komatitsch
Lagzi
Lagzi
Lagzi
Lagzi
Lengyel
Li
Liu
Liu
Lovas
Martin
Melchionna
Micikevicius
Molnár
Nakamasu
NVIDIA Corporation
Owens
Preis
Pápai
Rácz
Róbert Mészáros
Sainio
Sanderson
Sanna
Sanna
Schmidt
Senocak
Shoji
Shoji
Simek
Stone
Stone
Sultan
Volford
Volford
Walsh
Publication venue: 'Elsevier BV'
Publication date: 03/04/2010
Field of study

Numerical solution of reaction-diffusion equations in three dimensions is one of the most challenging applied mathematical problems. Since these simulations are very time consuming, any ideas and strategies aiming at the reduction of CPU time are important topics of research. A general and robust idea is the parallelization of source codes/programs. Recently, the technological development of graphics hardware created a possibility to use desktop video cards to solve numerically intensive problems. We present a powerful parallel computing framework to solve reaction-diffusion equations numerically using the Graphics Processing Units (GPUs) with CUDA. Four different reaction-diffusion problems, (i) diffusion of chemically inert compound, (ii) Turing pattern formation, (iii) phase separation in the wake of a moving diffusion front and (iv) air pollution dispersion were solved, and additionally both the Shared method and the Moving Tiles method were tested. Our results show that parallel implementation achieves typical acceleration values in the order of 5-40 times compared to CPU using a single-threaded implementation on a 2.8 GHz desktop computer.Comment: 8 figures, 5 table

arXiv.org e-Print Archive

Crossref

University of Twente Research Information

Hardware acceleration of reaction-diffusion systems:a guide to optimisation of pattern formation algorithms using OpenACC

Author: Falconer Ruth E.
Houston Alasdair N.
Otten Wilfred
Portell Xavier
Publication venue
Publication date: 10/06/2019
Field of study

Reaction Diffusion Systems (RDS) have widespread applications in computational ecology, biology, computer graphics and the visual arts. For the former applications a major barrier to the development of effective simulation models is their computational complexity - it takes a great deal of processing power to simulate enough replicates such that reliable conclusions can be drawn. Optimizing the computation is thus highly desirable in order to obtain more results with less resources. Existing optimizations of RDS tend to be low-level and GPGPU based. Here we apply the higher-level OpenACC framework to two case studies: a simple RDS to learn the ‘workings’ of OpenACC and a more realistic and complex example. Our results show that simple parallelization directives and minimal data transfer can produce a useful performance improvement. The relative simplicity of porting OpenACC code between heterogeneous hardware is a key benefit to the scientific computing community in terms of speed-up and portability

Abertay Research Portal

Crossref

Air pollution modelling using a graphics processing unit with CUDA

Author: Lagzi Istvan
Meszaros Robert
Molnar Jr. Ferenc
Szakaly Tamas
Publication venue: 'Elsevier BV'
Publication date: 16/12/2009
Field of study

The Graphics Processing Unit (GPU) is a powerful tool for parallel computing. In the past years the performance and capabilities of GPUs have increased, and the Compute Unified Device Architecture (CUDA) - a parallel computing architecture - has been developed by NVIDIA to utilize this performance in general purpose computations. Here we show for the first time a possible application of GPU for environmental studies serving as a basement for decision making strategies. A stochastic Lagrangian particle model has been developed on CUDA to estimate the transport and the transformation of the radionuclides from a single point source during an accidental release. Our results show that parallel implementation achieves typical acceleration values in the order of 80-120 times compared to CPU using a single-threaded implementation on a 2.33 GHz desktop computer. Only very small differences have been found between the results obtained from GPU and CPU simulations, which are comparable with the effect of stochastic transport phenomena in atmosphere. The relatively high speedup with no additional costs to maintain this parallel architecture could result in a wide usage of GPU for diversified environmental applications in the near future.Comment: 5 figure

arXiv.org e-Print Archive

ELTE Digital Institutional Repository (EDIT)

MeshfreeFlowNet: A Physics-Constrained Deep Continuous Space-Time Super-Resolution Framework

Author: Anandkumar Anima
Azizzadenesheli Kamyar
Esmaeilzadeh Soheil
Jiang Chiyu Max
Kashinath Karthik
Marcus Philip
Mustafa Mustafa
Prabhat
Tchelepi Hamdi A.
Publication venue
Publication date: 01/05/2020
Field of study

We propose MeshfreeFlowNet, a novel deep learning-based super-resolution framework to generate continuous (grid-free) spatio-temporal solutions from the low-resolution inputs. While being computationally efficient, MeshfreeFlowNet accurately recovers the fine-scale quantities of interest. MeshfreeFlowNet allows for: (i) the output to be sampled at all spatio-temporal resolutions, (ii) a set of Partial Differential Equation (PDE) constraints to be imposed, and (iii) training on fixed-size inputs on arbitrarily sized spatio-temporal domains owing to its fully convolutional encoder. We empirically study the performance of MeshfreeFlowNet on the task of super-resolution of turbulent flows in the Rayleigh-Benard convection problem. Across a diverse set of evaluation metrics, we show that MeshfreeFlowNet significantly outperforms existing baselines. Furthermore, we provide a large scale implementation of MeshfreeFlowNet and show that it efficiently scales across large clusters, achieving 96.80% scaling efficiency on up to 128 GPUs and a training time of less than 4 minutes.Comment: Supplementary Video: https://youtu.be/mjqwPch9gDo. Accepted to SC2

arXiv.org e-Print Archive

Caltech Authors