2,994 research outputs found
Developing EfïŹcient Discrete Simulations on Multicore and GPU Architectures
In this paper we show how to efïŹciently implement parallel discrete simulations on multicoreandGPUarchitecturesthrougharealexampleofanapplication: acellularautomatamodel of laser dynamics. We describe the techniques employed to build and optimize the implementations using OpenMP and CUDA frameworks. We have evaluated the performance on two different hardware platforms that represent different target market segments: high-end platforms for scientiïŹc computing, using an Intel Xeon Platinum 8259CL server with 48 cores, and also an NVIDIA Tesla V100GPU,bothrunningonAmazonWebServer(AWS)Cloud;and on a consumer-oriented platform, using an Intel Core i9 9900k CPU and an NVIDIA GeForce GTX 1050 TI GPU. Performance results were compared and analyzed in detail. We show that excellent performance and scalability can be obtained in both platforms, and we extract some important issues that imply a performance degradation for them. We also found that current multicore CPUs with large core numbers can bring a performance very near to that of GPUs, and even identical in some cases.Ministerio de EconomĂa, Industria y Competitividad, Gobierno de España (MINECO), and the Agencia Estatal de InvestigaciĂłn (AEI) of Spain, coïŹnanced by FEDER funds (EU) TIN2017-89842
GPU accelerated Monte Carlo simulation of Brownian motors dynamics with CUDA
This work presents an updated and extended guide on methods of a proper
acceleration of the Monte Carlo integration of stochastic differential
equations with the commonly available NVIDIA Graphics Processing Units using
the CUDA programming environment. We outline the general aspects of the
scientific computing on graphics cards and demonstrate them with two models of
a well known phenomenon of the noise induced transport of Brownian motors in
periodic structures. As a source of fluctuations in the considered systems we
selected the three most commonly occurring noises: the Gaussian white noise,
the white Poissonian noise and the dichotomous process also known as a random
telegraph signal. The detailed discussion on various aspects of the applied
numerical schemes is also presented. The measured speedup can be of the
astonishing order of about 3000 when compared to a typical CPU. This number
significantly expands the range of problems solvable by use of stochastic
simulations, allowing even an interactive research in some cases.Comment: 21 pages, 5 figures; Comput. Phys. Commun., accepted, 201
QYMSYM: A GPU-Accelerated Hybrid Symplectic Integrator That Permits Close Encounters
We describe a parallel hybrid symplectic integrator for planetary system
integration that runs on a graphics processing unit (GPU). The integrator
identifies close approaches between particles and switches from symplectic to
Hermite algorithms for particles that require higher resolution integrations.
The integrator is approximately as accurate as other hybrid symplectic
integrators but is GPU accelerated.Comment: 17 pages, 2 figure
Fat vs. thin threading approach on GPUs: application to stochastic simulation of chemical reactions
We explore two different threading approaches on a graphics processing unit (GPU) exploiting two different characteristics of the current GPU architecture. The fat thread approach tries to minimise data access time by relying on shared memory and registers potentially sacrificing parallelism. The thin thread approach maximises parallelism and tries to hide access latencies. We apply these two approaches to the parallel stochastic simulation of chemical reaction systems using the stochastic simulation algorithm (SSA) by Gillespie (J. Phys. Chem, Vol. 81, p. 2340-2361, 1977). In these cases, the proposed thin thread approach shows comparable performance while eliminating the limitation of the reaction systemâs size
Simulation of 1+1 dimensional surface growth and lattices gases using GPUs
Restricted solid on solid surface growth models can be mapped onto binary
lattice gases. We show that efficient simulation algorithms can be realized on
GPUs either by CUDA or by OpenCL programming. We consider a
deposition/evaporation model following Kardar-Parisi-Zhang growth in 1+1
dimensions related to the Asymmetric Simple Exclusion Process and show that for
sizes, that fit into the shared memory of GPUs one can achieve the maximum
parallelization speedup ~ x100 for a Quadro FX 5800 graphics card with respect
to a single CPU of 2.67 GHz). This permits us to study the effect of quenched
columnar disorder, requiring extremely long simulation times. We compare the
CUDA realization with an OpenCL implementation designed for processor clusters
via MPI. A two-lane traffic model with randomized turning points is also
realized and the dynamical behavior has been investigated.Comment: 20 pages 12 figures, 1 table, to appear in Comp. Phys. Com
PyCOOL - a Cosmological Object-Oriented Lattice code written in Python
There are a number of different phenomena in the early universe that have to
be studied numerically with lattice simulations. This paper presents a graphics
processing unit (GPU) accelerated Python program called PyCOOL that solves the
evolution of scalar fields in a lattice with very precise symplectic
integrators. The program has been written with the intention to hit a sweet
spot of speed, accuracy and user friendliness. This has been achieved by using
the Python language with the PyCUDA interface to make a program that is easy to
adapt to different scalar field models. In this paper we derive the symplectic
dynamics that govern the evolution of the system and then present the
implementation of the program in Python and PyCUDA. The functionality of the
program is tested in a chaotic inflation preheating model, a single field
oscillon case and in a supersymmetric curvaton model which leads to Q-ball
production. We have also compared the performance of a consumer graphics card
to a professional Tesla compute card in these simulations. We find that the
program is not only accurate but also very fast. To further increase the
usefulness of the program we have equipped it with numerous post-processing
functions that provide useful information about the cosmological model. These
include various spectra and statistics of the fields. The program can be
additionally used to calculate the generated curvature perturbation. The
program is publicly available under GNU General Public License at
https://github.com/jtksai/PyCOOL . Some additional information can be found
from http://www.physics.utu.fi/tiedostot/theory/particlecosmology/pycool/ .Comment: 23 pages, 12 figures; some typos correcte
- âŠ