6,304 research outputs found
High Performance Direct Gravitational N-body Simulations on Graphics Processing Units
We present the results of gravitational direct -body simulations using the
commercial graphics processing units (GPU) NVIDIA Quadro FX1400 and GeForce
8800GTX, and compare the results with GRAPE-6Af special purpose hardware. The
force evaluation of the -body problem was implemented in Cg using the GPU
directly to speed-up the calculations. The integration of the equations of
motions were, running on the host computer, implemented in C using the 4th
order predictor-corrector Hermite integrator with block time steps. We find
that for a large number of particles (N \apgt 10^4) modern graphics
processing units offer an attractive low cost alternative to GRAPE special
purpose hardware. A modern GPU continues to give a relatively flat scaling with
the number of particles, comparable to that of the GRAPE. Using the same time
step criterion the total energy of the -body system was conserved better
than to one in on the GPU, which is only about an order of magnitude
worse than obtained with GRAPE. For N\apgt 10^6 the GeForce 8800GTX was about
20 times faster than the host computer. Though still about an order of
magnitude slower than GRAPE, modern GPU's outperform GRAPE in their low cost,
long mean time between failure and the much larger onboard memory; the
GRAPE-6Af holds at most 256k particles whereas the GeForce 8800GTF can hold 9
million particles in memory.Comment: Submitted to New Astronom
High Performance Direct Gravitational N-body Simulations on Graphics Processing Units -- II: An implementation in CUDA
We present the results of gravitational direct -body simulations using the
Graphics Processing Unit (GPU) on a commercial NVIDIA GeForce 8800GTX designed
for gaming computers. The force evaluation of the -body problem is
implemented in ``Compute Unified Device Architecture'' (CUDA) using the GPU to
speed-up the calculations. We tested the implementation on three different
-body codes: two direct -body integration codes, using the 4th order
predictor-corrector Hermite integrator with block time-steps, and one
Barnes-Hut treecode, which uses a 2nd order leapfrog integration scheme. The
integration of the equations of motions for all codes is performed on the host
CPU.
We find that for particles the GPU outperforms the GRAPE-6Af, if
some softening in the force calculation is accepted. Without softening and for
very small integration time steps the GRAPE still outperforms the GPU. We
conclude that modern GPUs offer an attractive alternative to GRAPE-6Af special
purpose hardware. Using the same time-step criterion, the total energy of the
-body system was conserved better than to one in on the GPU, only
about an order of magnitude worse than obtained with GRAPE-6Af. For N \apgt
10^5 the 8800GTX outperforms the host CPU by a factor of about 100 and runs at
about the same speed as the GRAPE-6Af.Comment: Accepted for publication in New Astronom
A pilgrimage to gravity on GPUs
In this short review we present the developments over the last 5 decades that
have led to the use of Graphics Processing Units (GPUs) for astrophysical
simulations. Since the introduction of NVIDIA's Compute Unified Device
Architecture (CUDA) in 2007 the GPU has become a valuable tool for N-body
simulations and is so popular these days that almost all papers about high
precision N-body simulations use methods that are accelerated by GPUs. With the
GPU hardware becoming more advanced and being used for more advanced algorithms
like gravitational tree-codes we see a bright future for GPU like hardware in
computational astrophysics.Comment: To appear in: European Physical Journal "Special Topics" : "Computer
Simulations on Graphics Processing Units" . 18 pages, 8 figure
SAPPORO: A way to turn your graphics cards into a GRAPE-6
We present Sapporo, a library for performing high-precision gravitational
N-body simulations on NVIDIA Graphical Processing Units (GPUs). Our library
mimics the GRAPE-6 library, and N-body codes currently running on GRAPE-6 can
switch to Sapporo by a simple relinking of the library. The precision of our
library is comparable to that of GRAPE-6, even though internally the GPU
hardware is limited to single precision arithmetics. This limitation is
effectively overcome by emulating double precision for calculating the distance
between particles. The performance loss of this operation is small (< 20%)
compared to the advantage of being able to run at high precision. We tested the
library using several GRAPE-6-enabled N-body codes, in particular with Starlab
and phiGRAPE. We measured peak performance of 800 Gflop/s for running with 10^6
particles on a PC with four commercial G92 architecture GPUs (two GeForce
9800GX2). As a production test, we simulated a 32k Plummer model with equal
mass stars well beyond core collapse. The simulation took 41 days, during which
the mean performance was 113 Gflop/s. The GPU did not show any problems from
running in a production environment for such an extended period of time.Comment: 13 pages, 9 figures, accepted to New Astronom
Sapporo2: A versatile direct -body library
Astrophysical direct -body methods have been one of the first production
algorithms to be implemented using NVIDIA's CUDA architecture. Now, almost
seven years later, the GPU is the most used accelerator device in astronomy for
simulating stellar systems. In this paper we present the implementation of the
Sapporo2 -body library, which allows researchers to use the GPU for -body
simulations with little to no effort. The first version, released five years
ago, is actively used, but lacks advanced features and versatility in numerical
precision and support for higher order integrators. In this updated version we
have rebuilt the code from scratch and added support for OpenCL,
multi-precision and higher order integrators. We show how to tune these codes
for different GPU architectures and present how to continue utilizing the GPU
optimal even when only a small number of particles () is integrated.
This careful tuning allows Sapporo2 to be faster than Sapporo1 even with the
added options and double precision data loads. The code runs on a range of
NVIDIA and AMD GPUs in single and double precision accuracy. With the addition
of OpenCL support the library is also able to run on CPUs and other
accelerators that support OpenCL.Comment: 15 pages, 7 figures. Accepted for publication in Computational
Astrophysics and Cosmolog
A sparse octree gravitational N-body code that runs entirely on the GPU processor
We present parallel algorithms for constructing and traversing sparse octrees
on graphics processing units (GPUs). The algorithms are based on parallel-scan
and sort methods. To test the performance and feasibility, we implemented them
in CUDA in the form of a gravitational tree-code which completely runs on the
GPU.(The code is publicly available at:
http://castle.strw.leidenuniv.nl/software.html) The tree construction and
traverse algorithms are portable to many-core devices which have support for
CUDA or OpenCL programming languages. The gravitational tree-code outperforms
tuned CPU code during the tree-construction and shows a performance improvement
of more than a factor 20 overall, resulting in a processing rate of more than
2.8 million particles per second.Comment: Accepted version. Published in Journal of Computational Physics. 35
pages, 12 figures, single colum
Application of graphics processing units to search pipelines for gravitational waves from coalescing binaries of compact objects
We report a novel application of a graphics processing unit (GPU) for the purpose of accelerating the search pipelines for gravitational waves from coalescing binaries of compact objects. A speed-up of 16-fold in total has been achieved with an NVIDIA GeForce 8800 Ultra GPU card compared with one core of a 2.5 GHz Intel Q9300 central processing unit (CPU). We show that substantial improvements are possible and discuss the reduction in CPU count required for the detection of inspiral sources afforded by the use of GPUs
- …