9,635 research outputs found
A pilgrimage to gravity on GPUs
In this short review we present the developments over the last 5 decades that
have led to the use of Graphics Processing Units (GPUs) for astrophysical
simulations. Since the introduction of NVIDIA's Compute Unified Device
Architecture (CUDA) in 2007 the GPU has become a valuable tool for N-body
simulations and is so popular these days that almost all papers about high
precision N-body simulations use methods that are accelerated by GPUs. With the
GPU hardware becoming more advanced and being used for more advanced algorithms
like gravitational tree-codes we see a bright future for GPU like hardware in
computational astrophysics.Comment: To appear in: European Physical Journal "Special Topics" : "Computer
Simulations on Graphics Processing Units" . 18 pages, 8 figure
SAPPORO: A way to turn your graphics cards into a GRAPE-6
We present Sapporo, a library for performing high-precision gravitational
N-body simulations on NVIDIA Graphical Processing Units (GPUs). Our library
mimics the GRAPE-6 library, and N-body codes currently running on GRAPE-6 can
switch to Sapporo by a simple relinking of the library. The precision of our
library is comparable to that of GRAPE-6, even though internally the GPU
hardware is limited to single precision arithmetics. This limitation is
effectively overcome by emulating double precision for calculating the distance
between particles. The performance loss of this operation is small (< 20%)
compared to the advantage of being able to run at high precision. We tested the
library using several GRAPE-6-enabled N-body codes, in particular with Starlab
and phiGRAPE. We measured peak performance of 800 Gflop/s for running with 10^6
particles on a PC with four commercial G92 architecture GPUs (two GeForce
9800GX2). As a production test, we simulated a 32k Plummer model with equal
mass stars well beyond core collapse. The simulation took 41 days, during which
the mean performance was 113 Gflop/s. The GPU did not show any problems from
running in a production environment for such an extended period of time.Comment: 13 pages, 9 figures, accepted to New Astronom
Direct -body code on low-power embedded ARM GPUs
This work arises on the environment of the ExaNeSt project aiming at design
and development of an exascale ready supercomputer with low energy consumption
profile but able to support the most demanding scientific and technical
applications. The ExaNeSt compute unit consists of densely-packed low-power
64-bit ARM processors, embedded within Xilinx FPGA SoCs. SoC boards are
heterogeneous architecture where computing power is supplied both by CPUs and
GPUs, and are emerging as a possible low-power and low-cost alternative to
clusters based on traditional CPUs. A state-of-the-art direct -body code
suitable for astrophysical simulations has been re-engineered in order to
exploit SoC heterogeneous platforms based on ARM CPUs and embedded GPUs.
Performance tests show that embedded GPUs can be effectively used to accelerate
real-life scientific calculations, and that are promising also because of their
energy efficiency, which is a crucial design in future exascale platforms.Comment: 16 pages, 7 figures, 1 table, accepted for publication in the
Computing Conference 2019 proceeding
FROST: a momentum-conserving CUDA implementation of a hierarchical fourth-order forward symplectic integrator
We present a novel hierarchical formulation of the fourth-order forward
symplectic integrator and its numerical implementation in the GPU-accelerated
direct-summation N-body code FROST. The new integrator is especially suitable
for simulations with a large dynamical range due to its hierarchical nature.
The strictly positive integrator sub-steps in a fourth-order symplectic
integrator are made possible by computing an additional gradient term in
addition to the Newtonian accelerations. All force calculations and kick
operations are synchronous so the integration algorithm is manifestly
momentum-conserving. We also employ a time-step symmetrisation procedure to
approximately restore the time-reversibility with adaptive individual
time-steps. We demonstrate in a series of binary, few-body and million-body
simulations that FROST conserves energy to a level of while errors in linear and angular momentum are practically
negligible. For typical star cluster simulations, we find that FROST scales
well up to GPUs, making direct
summation N-body simulations beyond particles possible on systems with
several hundred and more GPUs. Due to the nature of hierarchical integration
the inclusion of a Kepler solver or a regularised integrator with
post-Newtonian corrections for close encounters and binaries in the code is
straightforward.Comment: 18 pages, 7 figures. Accepted for publication in MNRA
High Performance Direct Gravitational N-body Simulations on Graphics Processing Units -- II: An implementation in CUDA
We present the results of gravitational direct -body simulations using the
Graphics Processing Unit (GPU) on a commercial NVIDIA GeForce 8800GTX designed
for gaming computers. The force evaluation of the -body problem is
implemented in ``Compute Unified Device Architecture'' (CUDA) using the GPU to
speed-up the calculations. We tested the implementation on three different
-body codes: two direct -body integration codes, using the 4th order
predictor-corrector Hermite integrator with block time-steps, and one
Barnes-Hut treecode, which uses a 2nd order leapfrog integration scheme. The
integration of the equations of motions for all codes is performed on the host
CPU.
We find that for particles the GPU outperforms the GRAPE-6Af, if
some softening in the force calculation is accepted. Without softening and for
very small integration time steps the GRAPE still outperforms the GPU. We
conclude that modern GPUs offer an attractive alternative to GRAPE-6Af special
purpose hardware. Using the same time-step criterion, the total energy of the
-body system was conserved better than to one in on the GPU, only
about an order of magnitude worse than obtained with GRAPE-6Af. For N \apgt
10^5 the 8800GTX outperforms the host CPU by a factor of about 100 and runs at
about the same speed as the GRAPE-6Af.Comment: Accepted for publication in New Astronom
StePS: A Multi-GPU Cosmological N-body Code for Compactified Simulations
We present the multi-GPU realization of the StePS (Stereographically
Projected Cosmological Simulations) algorithm with MPI-OpenMP-CUDA hybrid
parallelization and nearly ideal scale-out to multiple compute nodes. Our new
zoom-in cosmological direct N-body simulation method simulates the infinite
universe with unprecedented dynamic range for a given amount of memory and, in
contrast to traditional periodic simulations, its fundamental geometry and
topology match observations. By using a spherical geometry instead of periodic
boundary conditions, and gradually decreasing the mass resolution with radius,
our code is capable of running simulations with a few gigaparsecs in diameter
and with a mass resolution of in the center in four days
on three compute nodes with four GTX 1080Ti GPUs in each. The code can also be
used to run extremely fast simulations with reasonable resolution for fitting
cosmological parameters. These simulations are useful for prediction needs of
large surveys. The StePS code is publicly available for the research community
Three-dimensional shapelets and an automated classification scheme for dark matter haloes
We extend the two-dimensional Cartesian shapelet formalism to d-dimensions.
Concentrating on the three-dimensional case, we derive shapelet-based equations
for the mass, centroid, root-mean-square radius, and components of the
quadrupole moment and moment of inertia tensors. Using cosmological N-body
simulations as an application domain, we show that three-dimensional shapelets
can be used to replicate the complex sub-structure of dark matter halos and
demonstrate the basis of an automated classification scheme for halo shapes. We
investigate the shapelet decomposition process from an algorithmic viewpoint,
and consider opportunities for accelerating the computation of shapelet-based
representations using graphics processing units (GPUs).Comment: 19 pages, 11 figures, accepted for publication in MNRA
Accelerating NBODY6 with Graphics Processing Units
We describe the use of Graphics Processing Units (GPUs) for speeding up the
code NBODY6 which is widely used for direct -body simulations. Over the
years, the nature of the direct force calculation has proved a barrier
for extending the particle number. Following an early introduction of force
polynomials and individual time-steps, the calculation cost was first reduced
by the introduction of a neighbour scheme. After a decade of GRAPE computers
which speeded up the force calculation further, we are now in the era of GPUs
where relatively small hardware systems are highly cost-effective. A
significant gain in efficiency is achieved by employing the GPU to obtain the
so-called regular force which typically involves some 99 percent of the
particles, while the remaining local forces are evaluated on the host. However,
the latter operation is performed up to 20 times more frequently and may still
account for a significant cost. This effort is reduced by parallel SSE/AVX
procedures where each interaction term is calculated using mainly single
precision. We also discuss further strategies connected with coordinate and
velocity prediction required by the integration scheme. This leaves hard
binaries and multiple close encounters which are treated by several
regularization methods. The present nbody6-GPU code is well balanced for
simulations in the particle range for a dual GPU system
attached to a standard PC.Comment: 8 pages, 3 figures, 2 tables, MNRAS accepte
A new gravitational N-body simulation algorithm for investigation of cosmological chaotic advection
Recently alternative approaches in cosmology seeks to explain the nature of
dark matter as a direct result of the non-linear spacetime curvature due to
different types of deformation potentials. In this context, a key test for this
hypothesis is to examine the effects of deformation on the evolution of large
scales structures. An important requirement for the fine analysis of this pure
gravitational signature (without dark matter elements) is to characterize the
position of a galaxy during its trajectory to the gravitational collapse of
super clusters at low redshifts. In this context, each element in an
gravitational N-body simulation behaves as a tracer of collapse governed by the
process known as chaotic advection (or lagrangian turbulence). In order to
develop a detailed study of this new approach we develop the COsmic LAgrangian
TUrbulence Simulator (COLATUS) to perform gravitational N-body simulations
based on Compute Unified Device Architecture (CUDA) for graphics processing
units (GPUs). In this paper we report the first robust results obtained from
COLATUS.Comment: Proceedings of Sixth International School on Field Theory and
Gravitation-2012 - by American Institute of Physic
- âŠ