1,255 research outputs found
Simulation of 1+1 dimensional surface growth and lattices gases using GPUs
Restricted solid on solid surface growth models can be mapped onto binary
lattice gases. We show that efficient simulation algorithms can be realized on
GPUs either by CUDA or by OpenCL programming. We consider a
deposition/evaporation model following Kardar-Parisi-Zhang growth in 1+1
dimensions related to the Asymmetric Simple Exclusion Process and show that for
sizes, that fit into the shared memory of GPUs one can achieve the maximum
parallelization speedup ~ x100 for a Quadro FX 5800 graphics card with respect
to a single CPU of 2.67 GHz). This permits us to study the effect of quenched
columnar disorder, requiring extremely long simulation times. We compare the
CUDA realization with an OpenCL implementation designed for processor clusters
via MPI. A two-lane traffic model with randomized turning points is also
realized and the dynamical behavior has been investigated.Comment: 20 pages 12 figures, 1 table, to appear in Comp. Phys. Com
Sapporo2: A versatile direct -body library
Astrophysical direct -body methods have been one of the first production
algorithms to be implemented using NVIDIA's CUDA architecture. Now, almost
seven years later, the GPU is the most used accelerator device in astronomy for
simulating stellar systems. In this paper we present the implementation of the
Sapporo2 -body library, which allows researchers to use the GPU for -body
simulations with little to no effort. The first version, released five years
ago, is actively used, but lacks advanced features and versatility in numerical
precision and support for higher order integrators. In this updated version we
have rebuilt the code from scratch and added support for OpenCL,
multi-precision and higher order integrators. We show how to tune these codes
for different GPU architectures and present how to continue utilizing the GPU
optimal even when only a small number of particles () is integrated.
This careful tuning allows Sapporo2 to be faster than Sapporo1 even with the
added options and double precision data loads. The code runs on a range of
NVIDIA and AMD GPUs in single and double precision accuracy. With the addition
of OpenCL support the library is also able to run on CPUs and other
accelerators that support OpenCL.Comment: 15 pages, 7 figures. Accepted for publication in Computational
Astrophysics and Cosmolog
Comparison of Different Parallel Implementations of the 2+1-Dimensional KPZ Model and the 3-Dimensional KMC Model
We show that efficient simulations of the Kardar-Parisi-Zhang interface
growth in 2 + 1 dimensions and of the 3-dimensional Kinetic Monte Carlo of
thermally activated diffusion can be realized both on GPUs and modern CPUs. In
this article we present results of different implementations on GPUs using CUDA
and OpenCL and also on CPUs using OpenCL and MPI. We investigate the runtime
and scaling behavior on different architectures to find optimal solutions for
solving current simulation problems in the field of statistical physics and
materials science.Comment: 14 pages, 8 figures, to be published in a forthcoming EPJST special
issue on "Computer simulations on GPU
Highly accelerated simulations of glassy dynamics using GPUs: caveats on limited floating-point precision
Modern graphics processing units (GPUs) provide impressive computing
resources, which can be accessed conveniently through the CUDA programming
interface. We describe how GPUs can be used to considerably speed up molecular
dynamics (MD) simulations for system sizes ranging up to about 1 million
particles. Particular emphasis is put on the numerical long-time stability in
terms of energy and momentum conservation, and caveats on limited
floating-point precision are issued. Strict energy conservation over 10^8 MD
steps is obtained by double-single emulation of the floating-point arithmetic
in accuracy-critical parts of the algorithm. For the slow dynamics of a
supercooled binary Lennard-Jones mixture, we demonstrate that the use of
single-floating point precision may result in quantitatively and even
physically wrong results. For simulations of a Lennard-Jones fluid, the
described implementation shows speedup factors of up to 80 compared to a serial
implementation for the CPU, and a single GPU was found to compare with a
parallelised MD simulation using 64 distributed cores.Comment: 12 pages, 7 figures, to appear in Comp. Phys. Comm., HALMD package
licensed under the GPL, see http://research.colberg.org/projects/halm
High Performance Algorithms for Counting Collisions and Pairwise Interactions
The problem of counting collisions or interactions is common in areas as
computer graphics and scientific simulations. Since it is a major bottleneck in
applications of these areas, a lot of research has been carried out on such
subject, mainly focused on techniques that allow calculations to be performed
within pruned sets of objects. This paper focuses on how interaction
calculation (such as collisions) within these sets can be done more efficiently
than existing approaches. Two algorithms are proposed: a sequential algorithm
that has linear complexity at the cost of high memory usage; and a parallel
algorithm, mathematically proved to be correct, that manages to use GPU
resources more efficiently than existing approaches. The proposed and existing
algorithms were implemented, and experiments show a speedup of 21.7 for the
sequential algorithm (on small problem size), and 1.12 for the parallel
proposal (large problem size). By improving interaction calculation, this work
contributes to research areas that promote interconnection in the modern world,
such as computer graphics and robotics.Comment: Accepted in ICCS 2019 and published in Springer's LNCS series.
Supplementary content at https://mjsaldanha.com/articles/1-hpc-ssp
Toward large-scale Hybrid Monte Carlo simulations of the Hubbard model on graphics processing units
The performance of the Hybrid Monte Carlo algorithm is determined by the
speed of sparse matrix-vector multiplication within the context of
preconditioned conjugate gradient iteration. We study these operations as
implemented for the fermion matrix of the Hubbard model in d+1 space-time
dimensions, and report a performance comparison between a 2.66 GHz Intel Xeon
E5430 CPU and an NVIDIA Tesla C1060 GPU using double-precision arithmetic. We
find speedup factors ranging between 30-350 for d = 1, and in excess of 40 for
d = 3. We argue that such speedups are of considerable impact for large-scale
simulational studies of quantum many-body systems.Comment: 8 pages, 5 figure
- …