Search CORE

9,635 research outputs found

A pilgrimage to gravity on GPUs

Author: A. Ahmad
A. Gualandris
A. Tanikawa
E. Gaburov
E. Holmberg
E.N. Dorband
G.J. Sussman
J. Barnes
J. Bédorf
J. Bédorf
J. Goodman
J. Makino
J.H. Applegate
J.R. Hurley
K. Nitadori
L. Nyland
M. Fujii
P. Hut
R. Spurzem
R. Spurzem
R. Spurzem
R. Yokota
R.G. Belleman
R.H. Miller
S. Harfst
S. Inagaki
S. Portegies Zwart
S. Portegies Zwart
S. Portegies Zwart
S. von Hoerner
S.F. Portegies Zwart
S.F. Portegies Zwart
S.J. Aarseth
S.J. Aarseth
T. Fukushige
T.S. van Albada
W. Dehnen
W. Dehnen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/04/2012
Field of study

In this short review we present the developments over the last 5 decades that have led to the use of Graphics Processing Units (GPUs) for astrophysical simulations. Since the introduction of NVIDIA's Compute Unified Device Architecture (CUDA) in 2007 the GPU has become a valuable tool for N-body simulations and is so popular these days that almost all papers about high precision N-body simulations use methods that are accelerated by GPUs. With the GPU hardware becoming more advanced and being used for more advanced algorithms like gravitational tree-codes we see a bright future for GPU like hardware in computational astrophysics.Comment: To appear in: European Physical Journal "Special Topics" : "Computer Simulations on Graphics Processing Units" . 18 pages, 8 figure

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Leiden University Scholary Publications

SAPPORO: A way to turn your graphics cards into a GRAPE-6

Author: Aarseth
Anderson
Belleman
Dorband
Evghenii Gaburov
Fernando
Fernando
Ford
Gualandris
Harfst
Harfst
Heggie
Makino
Makino
Nitadori
Plummer
Portegies Zwart
Portegies Zwart
Portegies Zwart
Simon Portegies Zwart
Stefan Harfst
Sussman
van Meel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

We present Sapporo, a library for performing high-precision gravitational N-body simulations on NVIDIA Graphical Processing Units (GPUs). Our library mimics the GRAPE-6 library, and N-body codes currently running on GRAPE-6 can switch to Sapporo by a simple relinking of the library. The precision of our library is comparable to that of GRAPE-6, even though internally the GPU hardware is limited to single precision arithmetics. This limitation is effectively overcome by emulating double precision for calculating the distance between particles. The performance loss of this operation is small (< 20%) compared to the advantage of being able to run at high precision. We tested the library using several GRAPE-6-enabled N-body codes, in particular with Starlab and phiGRAPE. We measured peak performance of 800 Gflop/s for running with 10^6 particles on a PC with four commercial G92 architecture GPUs (two GeForce 9800GX2). As a production test, we simulated a 32k Plummer model with equal mass stars well beyond core collapse. The simulation took 41 days, during which the mean performance was 113 Gflop/s. The GPU did not show any problems from running in a production environment for such an extended period of time.Comment: 13 pages, 9 figures, accepted to New Astronom

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

Direct $N$ -body code on low-power embedded ARM GPUs

Author: AR Brodtkorb
E Bortolas
F Perez
J Hunter
K Nitadori
K Nitadori
M Katevenis
M Spera
R Capuzzo-Dolcetta
R Capuzzo-Dolcetta
S Harfst
S Konstantinidis
S Walt van der
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/01/2019
Field of study

This work arises on the environment of the ExaNeSt project aiming at design and development of an exascale ready supercomputer with low energy consumption profile but able to support the most demanding scientific and technical applications. The ExaNeSt compute unit consists of densely-packed low-power 64-bit ARM processors, embedded within Xilinx FPGA SoCs. SoC boards are heterogeneous architecture where computing power is supplied both by CPUs and GPUs, and are emerging as a possible low-power and low-cost alternative to clusters based on traditional CPUs. A state-of-the-art direct

N

-body code suitable for astrophysical simulations has been re-engineered in order to exploit SoC heterogeneous platforms based on ARM CPUs and embedded GPUs. Performance tests show that embedded GPUs can be effectively used to accelerate real-life scientific calculations, and that are promising also because of their energy efficiency, which is a crucial design in future exascale platforms.Comment: 16 pages, 7 figures, 1 table, accepted for publication in the Computing Conference 2019 proceeding

arXiv.org e-Print Archive

Crossref

FROST: a momentum-conserving CUDA implementation of a hierarchical fourth-order forward symplectic integrator

Author: Naab Thorsten
Rantala Antti
Springel Volker
Publication venue: 'Oxford University Press (OUP)'
Publication date: 07/01/2021
Field of study

We present a novel hierarchical formulation of the fourth-order forward symplectic integrator and its numerical implementation in the GPU-accelerated direct-summation N-body code FROST. The new integrator is especially suitable for simulations with a large dynamical range due to its hierarchical nature. The strictly positive integrator sub-steps in a fourth-order symplectic integrator are made possible by computing an additional gradient term in addition to the Newtonian accelerations. All force calculations and kick operations are synchronous so the integration algorithm is manifestly momentum-conserving. We also employ a time-step symmetrisation procedure to approximately restore the time-reversibility with adaptive individual time-steps. We demonstrate in a series of binary, few-body and million-body simulations that FROST conserves energy to a level of

|\Delta E / E| \sim 10^{-10}

while errors in linear and angular momentum are practically negligible. For typical star cluster simulations, we find that FROST scales well up to

N_\mathrm{GPU}^\mathrm{max}\sim 4\times N/10^5

GPUs, making direct summation N-body simulations beyond

N=10^6

particles possible on systems with several hundred and more GPUs. Due to the nature of hierarchical integration the inclusion of a Kepler solver or a regularised integrator with post-Newtonian corrections for close encounters and binaries in the code is straightforward.Comment: 18 pages, 7 figures. Accepted for publication in MNRA

arXiv.org e-Print Archive

MPG.PuRe

High Performance Direct Gravitational N-body Simulations on Graphics Processing Units -- II: An implementation in CUDA

Author: Aarseth
Barnes
Buck
Fernando
Heggie
Jeroen Bédorf
Makino
Makino
Mark
McMillan
Moore
Nitadori
Owens
Owens
Pharr
Portegies Zwart
Portegies Zwart
Robert G. Belleman
Simon F. Portegies Zwart
Warren
Publication venue: 'Elsevier BV'
Publication date: 16/07/2007
Field of study

We present the results of gravitational direct

N

-body simulations using the Graphics Processing Unit (GPU) on a commercial NVIDIA GeForce 8800GTX designed for gaming computers. The force evaluation of the

N

-body problem is implemented in ``Compute Unified Device Architecture'' (CUDA) using the GPU to speed-up the calculations. We tested the implementation on three different

N

-body codes: two direct

N

-body integration codes, using the 4th order predictor-corrector Hermite integrator with block time-steps, and one Barnes-Hut treecode, which uses a 2nd order leapfrog integration scheme. The integration of the equations of motions for all codes is performed on the host CPU. We find that for

N > 512

particles the GPU outperforms the GRAPE-6Af, if some softening in the force calculation is accepted. Without softening and for very small integration time steps the GRAPE still outperforms the GPU. We conclude that modern GPUs offer an attractive alternative to GRAPE-6Af special purpose hardware. Using the same time-step criterion, the total energy of the

N

-body system was conserved better than to one in

10^6

on the GPU, only about an order of magnitude worse than obtained with GRAPE-6Af. For N \apgt 10^5 the 8800GTX outperforms the host CPU by a factor of about 100 and runs at about the same speed as the GRAPE-6Af.Comment: Accepted for publication in New Astronom

arXiv.org e-Print Archive

Crossref

Leiden University Scholary Publications

UvA-DARE

International Migration, Integration and Social Cohesion online publications

StePS: A Multi-GPU Cosmological N-body Code for Compactified Simulations

Author: Csabai István
Dobos László
Rácz Gábor
Szalay Alexander S.
Szapudi István
Publication venue
Publication date: 01/01/2019
Field of study

We present the multi-GPU realization of the StePS (Stereographically Projected Cosmological Simulations) algorithm with MPI-OpenMP-CUDA hybrid parallelization and nearly ideal scale-out to multiple compute nodes. Our new zoom-in cosmological direct N-body simulation method simulates the infinite universe with unprecedented dynamic range for a given amount of memory and, in contrast to traditional periodic simulations, its fundamental geometry and topology match observations. By using a spherical geometry instead of periodic boundary conditions, and gradually decreasing the mass resolution with radius, our code is capable of running simulations with a few gigaparsecs in diameter and with a mass resolution of

\sim 10^{9}M_{\odot}

in the center in four days on three compute nodes with four GTX 1080Ti GPUs in each. The code can also be used to run extremely fast simulations with reasonable resolution for fitting cosmological parameters. These simulations are useful for prediction needs of large surveys. The StePS code is publicly available for the research community

arXiv.org e-Print Archive

Repository of the Academy's Library

Three-dimensional shapelets and an automated classification scheme for dark matter haloes

Author: Barsdell B. R.
Fluke C. J.
Lasky P. D.
Malec A. L.
Publication venue: 'Wiley'
Publication date: 19/12/2011
Field of study

We extend the two-dimensional Cartesian shapelet formalism to d-dimensions. Concentrating on the three-dimensional case, we derive shapelet-based equations for the mass, centroid, root-mean-square radius, and components of the quadrupole moment and moment of inertia tensors. Using cosmological N-body simulations as an application domain, we show that three-dimensional shapelets can be used to replicate the complex sub-structure of dark matter halos and demonstrate the basis of an automated classification scheme for halo shapes. We investigate the shapelet decomposition process from an algorithmic viewpoint, and consider opportunities for accelerating the computation of shapelet-based representations using graphics processing units (GPUs).Comment: 19 pages, 11 figures, accepted for publication in MNRA

arXiv.org e-Print Archive

Publikationsserver der Universität Tübingen

Swinburne Research Bank

Accelerating NBODY6 with Graphics Processing Units

Author: Aarseth
Aarseth
Aarseth
Ahmad
Belleman
Bulirsch
Fukushima
Gaburov
Keigo Nitadori
Kustaanheimo
Makino
Makino
Makino
Makino
Mikkola
Mikkola
Mikkola
Spurzem
Sverre J. Aarseth
Tanikawa
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

We describe the use of Graphics Processing Units (GPUs) for speeding up the code NBODY6 which is widely used for direct

N

-body simulations. Over the years, the

N^2

nature of the direct force calculation has proved a barrier for extending the particle number. Following an early introduction of force polynomials and individual time-steps, the calculation cost was first reduced by the introduction of a neighbour scheme. After a decade of GRAPE computers which speeded up the force calculation further, we are now in the era of GPUs where relatively small hardware systems are highly cost-effective. A significant gain in efficiency is achieved by employing the GPU to obtain the so-called regular force which typically involves some 99 percent of the particles, while the remaining local forces are evaluated on the host. However, the latter operation is performed up to 20 times more frequently and may still account for a significant cost. This effort is reduced by parallel SSE/AVX procedures where each interaction term is calculated using mainly single precision. We also discuss further strategies connected with coordinate and velocity prediction required by the integration scheme. This leaves hard binaries and multiple close encounters which are treated by several regularization methods. The present nbody6-GPU code is well balanced for simulations in the particle range

10^4-2 \times 10^5

for a dual GPU system attached to a standard PC.Comment: 8 pages, 3 figures, 2 tables, MNRAS accepte

arXiv.org e-Print Archive

CiteSeerX

Crossref

A new gravitational N-body simulation algorithm for investigation of cosmological chaotic advection

Author: Araújo Amarísio da Silva
Clua Esteban
Gomes Vitor Conrado F.
Junior José da Silve
Ramos Fernando
Rosa Reinaldo R.
Ruiz Renata
Stalder Diego H.
Velho Haroldo F. Campos
Publication venue: 'AIP Publishing'
Publication date: 01/01/2012
Field of study

Recently alternative approaches in cosmology seeks to explain the nature of dark matter as a direct result of the non-linear spacetime curvature due to different types of deformation potentials. In this context, a key test for this hypothesis is to examine the effects of deformation on the evolution of large scales structures. An important requirement for the fine analysis of this pure gravitational signature (without dark matter elements) is to characterize the position of a galaxy during its trajectory to the gravitational collapse of super clusters at low redshifts. In this context, each element in an gravitational N-body simulation behaves as a tracer of collapse governed by the process known as chaotic advection (or lagrangian turbulence). In order to develop a detailed study of this new approach we develop the COsmic LAgrangian TUrbulence Simulator (COLATUS) to perform gravitational N-body simulations based on Compute Unified Device Architecture (CUDA) for graphics processing units (GPUs). In this paper we report the first robust results obtained from COLATUS.Comment: Proceedings of Sixth International School on Field Theory and Gravitation-2012 - by American Institute of Physic

arXiv.org e-Print Archive

Crossref