Search CORE

334 research outputs found

Sapporo2: A versatile direct $N$ -body library

Author: Bédorf Jeroen
Gaburov Evghenii
Zwart Simon Portegies
Publication venue
Publication date: 01/01/2015
Field of study

Astrophysical direct

N

-body methods have been one of the first production algorithms to be implemented using NVIDIA's CUDA architecture. Now, almost seven years later, the GPU is the most used accelerator device in astronomy for simulating stellar systems. In this paper we present the implementation of the Sapporo2

N

-body library, which allows researchers to use the GPU for

N

-body simulations with little to no effort. The first version, released five years ago, is actively used, but lacks advanced features and versatility in numerical precision and support for higher order integrators. In this updated version we have rebuilt the code from scratch and added support for OpenCL, multi-precision and higher order integrators. We show how to tune these codes for different GPU architectures and present how to continue utilizing the GPU optimal even when only a small number of particles (

N < 100

) is integrated. This careful tuning allows Sapporo2 to be faster than Sapporo1 even with the added options and double precision data loads. The code runs on a range of NVIDIA and AMD GPUs in single and double precision accuracy. With the addition of OpenCL support the library is also able to run on CPUs and other accelerators that support OpenCL.Comment: 15 pages, 7 figures. Accepted for publication in Computational Astrophysics and Cosmolog

arXiv.org e-Print Archive

Springer - Publisher Connector

Leiden University Scholary Publications

SAPPORO: A way to turn your graphics cards into a GRAPE-6

Author: Aarseth
Anderson
Belleman
Dorband
Evghenii Gaburov
Fernando
Fernando
Ford
Gualandris
Harfst
Harfst
Heggie
Makino
Makino
Nitadori
Plummer
Portegies Zwart
Portegies Zwart
Portegies Zwart
Simon Portegies Zwart
Stefan Harfst
Sussman
van Meel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

We present Sapporo, a library for performing high-precision gravitational N-body simulations on NVIDIA Graphical Processing Units (GPUs). Our library mimics the GRAPE-6 library, and N-body codes currently running on GRAPE-6 can switch to Sapporo by a simple relinking of the library. The precision of our library is comparable to that of GRAPE-6, even though internally the GPU hardware is limited to single precision arithmetics. This limitation is effectively overcome by emulating double precision for calculating the distance between particles. The performance loss of this operation is small (< 20%) compared to the advantage of being able to run at high precision. We tested the library using several GRAPE-6-enabled N-body codes, in particular with Starlab and phiGRAPE. We measured peak performance of 800 Gflop/s for running with 10^6 particles on a PC with four commercial G92 architecture GPUs (two GeForce 9800GX2). As a production test, we simulated a 32k Plummer model with equal mass stars well beyond core collapse. The simulation took 41 days, during which the mean performance was 113 Gflop/s. The GPU did not show any problems from running in a production environment for such an extended period of time.Comment: 13 pages, 9 figures, accepted to New Astronom

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

Accelerating Dust Temperature Calculations with Graphics Processing Units

Author: Jonsson Patrik
Primack Joel
Publication venue: 'Elsevier BV'
Publication date: 23/12/2009
Field of study

When calculating the infrared spectral energy distributions (SEDs) of galaxies in radiation-transfer models, the calculation of dust grain temperatures is generally the most time-consuming part of the calculation. Because of its highly parallel nature, this calculation is perfectly suited for massively parallel general-purpose Graphics Processing Units (GPUs). This paper presents an implementation of the calculation of dust grain equilibrium temperatures on GPUs in the Monte-Carlo radiation transfer code Sunrise, using the CUDA API. The GPU can perform this calculation 69 times faster than the 8 CPU cores, showing great potential for accelerating calculations of galaxy SEDs.Comment: 7 pages, 2 figures, accepted to New Astronomy. Minor updates to text and performance based on feedback from refere

arXiv.org e-Print Archive

CiteSeerX

Crossref

GAMER: a GPU-Accelerated Adaptive Mesh Refinement Code for Astrophysics

Author: Aubert
Bagla
Bryan
Campbell
Collins
Frigo
Fryxell
Gingold
Godunov
Hallman
Hockney
Hsi-Yu Schive
Klypin
Kravtsov
Landau
Martin
NVIDIA
O'Shea
Pen
Press
Ricker
Tzihong Chiueh
Woo
Yu-Chih Tsai
Publication venue: 'IOP Publishing'
Publication date: 24/12/2009
Field of study

We present the newly developed code, GAMER (GPU-accelerated Adaptive MEsh Refinement code), which has adopted a novel approach to improve the performance of adaptive mesh refinement (AMR) astrophysical simulations by a large factor with the use of the graphic processing unit (GPU). The AMR implementation is based on a hierarchy of grid patches with an oct-tree data structure. We adopt a three-dimensional relaxing TVD scheme for the hydrodynamic solver, and a multi-level relaxation scheme for the Poisson solver. Both solvers have been implemented in GPU, by which hundreds of patches can be advanced in parallel. The computational overhead associated with the data transfer between CPU and GPU is carefully reduced by utilizing the capability of asynchronous memory copies in GPU, and the computing time of the ghost-zone values for each patch is made to diminish by overlapping it with the GPU computations. We demonstrate the accuracy of the code by performing several standard test problems in astrophysics. GAMER is a parallel code that can be run in a multi-GPU cluster system. We measure the performance of the code by performing purely-baryonic cosmological simulations in different hardware implementations, in which detailed timing analyses provide comparison between the computations with and without GPU(s) acceleration. Maximum speed-up factors of 12.19 and 10.47 are demonstrated using 1 GPU with 4096^3 effective resolution and 16 GPUs with 8192^3 effective resolution, respectively.Comment: 60 pages, 22 figures, 3 tables. More accuracy tests are included. Accepted for publication in ApJ

arXiv.org e-Print Archive

CiteSeerX

Crossref

National Taiwan University Repository

Performance Analysis and Optimizations Techniques for Legacy Code Numerical Simulations

Author: Díaz Federico José
Tinetti Femando G.
Publication venue
Publication date: 17/09/2020
Field of study

Numerical simulations used today by scientists in various disciplines, are frequently based on implementations created when the predominant computing hardware was sequential by design. In this simulations, new features are added or updated, when new discoveries are made, but the computational implementation remains unchanged, not taking advantage of modern hardware architectures. This “legacy code” study cases, presents the opportunity to create a set of techniques and tools, oriented to perform optimizations from a computational and software engineering points of view. As an example, in conjunction with an astrophysics research group, a real-world case numerical integrator optimization is presented, were these techniques were applied, showing the results obtained.Instituto de Investigación en InformáticaInstituto de Investigación en Informátic

Servicio de Difusión de la Creación Intelectual

Swarm-NG: a CUDA Library for Parallel n-body Integrations with focus on Simulations of Planetary Systems

Author: Aaron C. Boley
Benjamin Nelson
Capuzzo-Dolcetta
Eric B. Ford
Jianwei Gao
Jörg Peters
Kokubo
Marcy
Mario Juric
Murray
Nguyen
Saleh Dindar
ter Braak
Wisdom
Young In Yeo
Publication venue: 'Elsevier BV'
Publication date: 24/09/2012
Field of study

We present Swarm-NG, a C++ library for the efficient direct integration of many n-body systems using highly-parallel Graphics Processing Unit (GPU), such as NVIDIA's Tesla T10 and M2070 GPUs. While previous studies have demonstrated the benefit of GPUs for n-body simulations with thousands to millions of bodies, Swarm-NG focuses on many few-body systems, e.g., thousands of systems with 3...15 bodies each, as is typical for the study of planetary systems. Swarm-NG parallelizes the simulation, including both the numerical integration of the equations of motion and the evaluation of forces using NVIDIA's "Compute Unified Device Architecture" (CUDA) on the GPU. Swarm-NG includes optimized implementations of 4th order time-symmetrized Hermite integration and mixed variable symplectic integration, as well as several sample codes for other algorithms to illustrate how non-CUDA-savvy users may themselves introduce customized integrators into the Swarm-NG framework. To optimize performance, we analyze the effect of GPU-specific parameters on performance under double precision. Applications of Swarm-NG include studying the late stages of planet formation, testing the stability of planetary systems and evaluating the goodness-of-fit between many planetary system models and observations of extrasolar planet host stars (e.g., radial velocity, astrometry, transit timing). While Swarm-NG focuses on the parallel integration of many planetary systems,the underlying integrators could be applied to a wide variety of problems that require repeatedly integrating a set of ordinary differential equations many times using different initial conditions and/or parameter values.Comment: Submitted to New Astronom

arXiv.org e-Print Archive

Crossref

A fully parallel, high precision, N-body code running on hybrid computing platforms

Author: Aarseth
Aarseth
Aarseth
Antonini
Bekki
Bédorf
Capuzzo-Dolcetta
Capuzzo-Dolcetta
Capuzzo-Dolcetta
Capuzzo-Dolcetta
Capuzzo-Dolcetta
Capuzzo-Dolcetta
Chandra
D. Punzo
Dehnen
Gaburov
Heggie
Holmberg
M. Spera
McMillan
McMillan
Milosavljević
Munshi
Nitadori
Nitadori
Nyland
Plummer
R. Capuzzo-Dolcetta
Sanders
Snir
Spurzem
von Hoerner
Wang
Publication venue: 'Elsevier BV'
Publication date: 11/07/2012
Field of study

We present a new implementation of the numerical integration of the classical, gravitational, N-body problem based on a high order Hermite's integration scheme with block time steps, with a direct evaluation of the particle-particle forces. The main innovation of this code (called HiGPUs) is its full parallelization, exploiting both OpenMP and MPI in the use of the multicore Central Processing Units as well as either Compute Unified Device Architecture (CUDA) or OpenCL for the hosted Graphic Processing Units. We tested both performance and accuracy of the code using up to 256 GPUs in the supercomputer IBM iDataPlex DX360M3 Linux Infiniband Cluster provided by the italian supercomputing consortium CINECA, for values of N up to 8 millions. We were able to follow the evolution of a system of 8 million bodies for few crossing times, task previously unreached by direct summation codes. The code is freely available to the scientific community.Comment: Paper submitted to Journal of Computational Physics consisting in 28 pages, 9 figures.The previous submitted version was lacking of the bibliography, for a Tex proble

arXiv.org e-Print Archive

Crossref

Sissa Digital Library

Archivio della ricerca- Università di Roma La Sapienza