Search CORE

13,007 research outputs found

A parallel gravitational N-body kernel

Author: Groen Derek
Gualandris Alessia
McMillan Steve
Sipior Michael
Vermin Willem
Zwart Simon Portegies
Publication venue: 'Elsevier BV'
Publication date: 05/11/2007
Field of study

We describe source code level parallelization for the {\tt kira} direct gravitational

N

-body integrator, the workhorse of the {\tt starlab} production environment for simulating dense stellar systems. The parallelization strategy, called ``j-parallelization'', involves the partition of the computational domain by distributing all particles in the system among the available processors. Partial forces on the particles to be advanced are calculated in parallel by their parent processors, and are then summed in a final global operation. Once total forces are obtained, the computing elements proceed to the computation of their particle trajectories. We report the results of timing measurements on four different parallel computers, and compare them with theoretical predictions. The computers employ either a high-speed interconnect, a NUMA architecture to minimize the communication overhead or are distributed in a grid. The code scales well in the domain tested, which ranges from 1024 - 65536 stars on 1 - 128 processors, providing satisfactory speedup. Running the production environment on a grid becomes inefficient for more than 60 processors distributed across three sites.Comment: 21 pages, New Astronomy (in press

arXiv.org e-Print Archive

Crossref

UCL Discovery

RIT Scholar Works

Surrey Research Insight

UvA-DARE

International Migration, Integration and Social Cohesion online publications

4.45 Pflops Astrophysical N-Body Simulation on K computer -- The Gravitational Trillion-Body Problem

Author: Ishiyama Tomoaki
Makino Junichiro
Nitadori Keigo
Publication venue
Publication date: 13/04/2015
Field of study

As an entry for the 2012 Gordon-Bell performance prize, we report performance results of astrophysical N-body simulations of one trillion particles performed on the full system of K computer. This is the first gravitational trillion-body simulation in the world. We describe the scientific motivation, the numerical algorithm, the parallelization strategy, and the performance analysis. Unlike many previous Gordon-Bell prize winners that used the tree algorithm for astrophysical N-body simulations, we used the hybrid TreePM method, for similar level of accuracy in which the short-range force is calculated by the tree algorithm, and the long-range force is solved by the particle-mesh algorithm. We developed a highly-tuned gravity kernel for short-range forces, and a novel communication algorithm for long-range forces. The average performance on 24576 and 82944 nodes of K computer are 1.53 and 4.45 Pflops, which correspond to 49% and 42% of the peak speed.Comment: 10 pages, 6 figures, Proceedings of Supercomputing 2012 (http://sc12.supercomputing.org/), Gordon Bell Prize Winner. Additional information is http://www.ccs.tsukuba.ac.jp/CCS/eng/gbp201

arXiv.org e-Print Archive

CiteSeerX

The GENGA Code: Gravitational Encounters in N-body simulations with GPU Acceleration

Author: Grimm Simon L.
Stadel Joachim G.
Publication venue: 'IOP Publishing'
Publication date: 01/01/2014
Field of study

We describe an open source GPU implementation of a hybrid symplectic N-body integrator, GENGA (Gravitational ENcounters with Gpu Acceleration), designed to integrate planet and planetesimal dynamics in the late stage of planet formation and stability analyses of planetary systems. GENGA uses a hybrid symplectic integrator to handle close encounters with very good energy conservation, which is essential in long-term planetary system integration. We extended the second order hybrid integration scheme to higher orders. The GENGA code supports three simulation modes: Integration of up to 2048 massive bodies, integration with up to a million test particles, or parallel integration of a large number of individual planetary systems. We compare the results of GENGA to Mercury and pkdgrav2 in respect of energy conservation and performance, and find that the energy conservation of GENGA is comparable to Mercury and around two orders of magnitude better than pkdgrav2. GENGA runs up to 30 times faster than Mercury and up to eight times faster than pkdgrav2. GENGA is written in CUDA C and runs on all NVIDIA GPUs with compute capability of at least 2.0.Comment: Accepted by ApJ. 18 pages, 17 figures, 4 table

arXiv.org e-Print Archive

ZORA

Bern Open Repository and Information System (BORIS)

Sapporo2: A versatile direct $N$ -body library

Author: Bédorf Jeroen
Gaburov Evghenii
Zwart Simon Portegies
Publication venue
Publication date: 01/01/2015
Field of study

Astrophysical direct

N

-body methods have been one of the first production algorithms to be implemented using NVIDIA's CUDA architecture. Now, almost seven years later, the GPU is the most used accelerator device in astronomy for simulating stellar systems. In this paper we present the implementation of the Sapporo2

N

-body library, which allows researchers to use the GPU for

N

-body simulations with little to no effort. The first version, released five years ago, is actively used, but lacks advanced features and versatility in numerical precision and support for higher order integrators. In this updated version we have rebuilt the code from scratch and added support for OpenCL, multi-precision and higher order integrators. We show how to tune these codes for different GPU architectures and present how to continue utilizing the GPU optimal even when only a small number of particles (

N < 100

) is integrated. This careful tuning allows Sapporo2 to be faster than Sapporo1 even with the added options and double precision data loads. The code runs on a range of NVIDIA and AMD GPUs in single and double precision accuracy. With the addition of OpenCL support the library is also able to run on CPUs and other accelerators that support OpenCL.Comment: 15 pages, 7 figures. Accepted for publication in Computational Astrophysics and Cosmolog

arXiv.org e-Print Archive

Springer - Publisher Connector

Leiden University Scholary Publications

SAPPORO: A way to turn your graphics cards into a GRAPE-6

Author: Aarseth
Anderson
Belleman
Dorband
Evghenii Gaburov
Fernando
Fernando
Ford
Gualandris
Harfst
Harfst
Heggie
Makino
Makino
Nitadori
Plummer
Portegies Zwart
Portegies Zwart
Portegies Zwart
Simon Portegies Zwart
Stefan Harfst
Sussman
van Meel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

We present Sapporo, a library for performing high-precision gravitational N-body simulations on NVIDIA Graphical Processing Units (GPUs). Our library mimics the GRAPE-6 library, and N-body codes currently running on GRAPE-6 can switch to Sapporo by a simple relinking of the library. The precision of our library is comparable to that of GRAPE-6, even though internally the GPU hardware is limited to single precision arithmetics. This limitation is effectively overcome by emulating double precision for calculating the distance between particles. The performance loss of this operation is small (< 20%) compared to the advantage of being able to run at high precision. We tested the library using several GRAPE-6-enabled N-body codes, in particular with Starlab and phiGRAPE. We measured peak performance of 800 Gflop/s for running with 10^6 particles on a PC with four commercial G92 architecture GPUs (two GeForce 9800GX2). As a production test, we simulated a 32k Plummer model with equal mass stars well beyond core collapse. The simulation took 41 days, during which the mean performance was 113 Gflop/s. The GPU did not show any problems from running in a production environment for such an extended period of time.Comment: 13 pages, 9 figures, accepted to New Astronom

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

Application of graphics processing units to search pipelines for gravitational waves from coalescing binaries of compact objects

Author: Blair David
Cannon Kipp
Chung Shin Kee
Datta Amitava
Wen Linqing
Publication venue: 'AIP Publishing'
Publication date: 07/07/2010
Field of study

We report a novel application of a graphics processing unit (GPU) for the purpose of accelerating the search pipelines for gravitational waves from coalescing binaries of compact objects. A speed-up of 16-fold in total has been achieved with an NVIDIA GeForce 8800 Ultra GPU card compared with one core of a 2.5 GHz Intel Q9300 central processing unit (CPU). We show that substantial improvements are possible and discuss the reduction in CPU count required for the detection of inspiral sources afforded by the use of GPUs

Caltech Authors

EvoL: The new Padova T-SPH parallel code for cosmological simulations - I. Basic code: gravity and hydrodynamics

Author: Athanassoula
Balsara
Barnes
Bate
Boss
Brookshaw
Burkert
C. Chiosi
Carraro
Couchman
Dehnen
Dolag
Dubinski
E. Merlin
Einfeldt
Evrard
Ewald
Fryxell
Gingold
Gresho
Harfst
Hernquist
Hernquist
Hernquist
Hu
L. Piovan
Lia
Liska
Lucy
McMillan
Merlin
Merritt
Monaghan
Monaghan
Monaghan
Monaghan
Monaghan
Monaghan
Monaghan
Monaghan
Morris
Nelson
Noh
Price
Price
Price
Price
Romeo
Rosswog
Rosswog
Saitoh
Salmon
Schuessler
Serna
Sod
Springel
Springel
Springel
Springel
Steinmetz
T. Grassi
Thacker
Thomas
U. Buonomo
Wadsley
Watkins
Wetzstein
Williams
Woodward
Publication venue: 'EDP Sciences'
Publication date: 09/12/2009
Field of study

We present EvoL, the new release of the Padova N-body code for cosmological simulations of galaxy formation and evolution. In this paper, the basic Tree + SPH code is presented and analysed, together with an overview on the software architectures. EvoL is a flexible parallel Fortran95 code, specifically designed for simulations of cosmological structure formation on cluster, galactic and sub-galactic scales. EvoL is a fully Lagrangian self-adaptive code, based on the classical Oct-tree and on the Smoothed Particle Hydrodynamics algorithm. It includes special features such as adaptive softening lengths with correcting extra-terms, and modern formulations of SPH and artificial viscosity. It is designed to be run in parallel on multiple CPUs to optimize the performance and save computational time. We describe the code in detail, and present the results of a number of standard hydrodynamical tests.Comment: 33 pages, 49 figures, accepted on A&

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)