9,635 research outputs found

    A pilgrimage to gravity on GPUs

    Get PDF
    In this short review we present the developments over the last 5 decades that have led to the use of Graphics Processing Units (GPUs) for astrophysical simulations. Since the introduction of NVIDIA's Compute Unified Device Architecture (CUDA) in 2007 the GPU has become a valuable tool for N-body simulations and is so popular these days that almost all papers about high precision N-body simulations use methods that are accelerated by GPUs. With the GPU hardware becoming more advanced and being used for more advanced algorithms like gravitational tree-codes we see a bright future for GPU like hardware in computational astrophysics.Comment: To appear in: European Physical Journal "Special Topics" : "Computer Simulations on Graphics Processing Units" . 18 pages, 8 figure

    SAPPORO: A way to turn your graphics cards into a GRAPE-6

    Full text link
    We present Sapporo, a library for performing high-precision gravitational N-body simulations on NVIDIA Graphical Processing Units (GPUs). Our library mimics the GRAPE-6 library, and N-body codes currently running on GRAPE-6 can switch to Sapporo by a simple relinking of the library. The precision of our library is comparable to that of GRAPE-6, even though internally the GPU hardware is limited to single precision arithmetics. This limitation is effectively overcome by emulating double precision for calculating the distance between particles. The performance loss of this operation is small (< 20%) compared to the advantage of being able to run at high precision. We tested the library using several GRAPE-6-enabled N-body codes, in particular with Starlab and phiGRAPE. We measured peak performance of 800 Gflop/s for running with 10^6 particles on a PC with four commercial G92 architecture GPUs (two GeForce 9800GX2). As a production test, we simulated a 32k Plummer model with equal mass stars well beyond core collapse. The simulation took 41 days, during which the mean performance was 113 Gflop/s. The GPU did not show any problems from running in a production environment for such an extended period of time.Comment: 13 pages, 9 figures, accepted to New Astronom

    Direct NN-body code on low-power embedded ARM GPUs

    Full text link
    This work arises on the environment of the ExaNeSt project aiming at design and development of an exascale ready supercomputer with low energy consumption profile but able to support the most demanding scientific and technical applications. The ExaNeSt compute unit consists of densely-packed low-power 64-bit ARM processors, embedded within Xilinx FPGA SoCs. SoC boards are heterogeneous architecture where computing power is supplied both by CPUs and GPUs, and are emerging as a possible low-power and low-cost alternative to clusters based on traditional CPUs. A state-of-the-art direct NN-body code suitable for astrophysical simulations has been re-engineered in order to exploit SoC heterogeneous platforms based on ARM CPUs and embedded GPUs. Performance tests show that embedded GPUs can be effectively used to accelerate real-life scientific calculations, and that are promising also because of their energy efficiency, which is a crucial design in future exascale platforms.Comment: 16 pages, 7 figures, 1 table, accepted for publication in the Computing Conference 2019 proceeding

    FROST: a momentum-conserving CUDA implementation of a hierarchical fourth-order forward symplectic integrator

    Full text link
    We present a novel hierarchical formulation of the fourth-order forward symplectic integrator and its numerical implementation in the GPU-accelerated direct-summation N-body code FROST. The new integrator is especially suitable for simulations with a large dynamical range due to its hierarchical nature. The strictly positive integrator sub-steps in a fourth-order symplectic integrator are made possible by computing an additional gradient term in addition to the Newtonian accelerations. All force calculations and kick operations are synchronous so the integration algorithm is manifestly momentum-conserving. We also employ a time-step symmetrisation procedure to approximately restore the time-reversibility with adaptive individual time-steps. We demonstrate in a series of binary, few-body and million-body simulations that FROST conserves energy to a level of ∣ΔE/EâˆŁâˆŒ10−10|\Delta E / E| \sim 10^{-10} while errors in linear and angular momentum are practically negligible. For typical star cluster simulations, we find that FROST scales well up to NGPUmax∌4×N/105N_\mathrm{GPU}^\mathrm{max}\sim 4\times N/10^5 GPUs, making direct summation N-body simulations beyond N=106N=10^6 particles possible on systems with several hundred and more GPUs. Due to the nature of hierarchical integration the inclusion of a Kepler solver or a regularised integrator with post-Newtonian corrections for close encounters and binaries in the code is straightforward.Comment: 18 pages, 7 figures. Accepted for publication in MNRA

    High Performance Direct Gravitational N-body Simulations on Graphics Processing Units -- II: An implementation in CUDA

    Get PDF
    We present the results of gravitational direct NN-body simulations using the Graphics Processing Unit (GPU) on a commercial NVIDIA GeForce 8800GTX designed for gaming computers. The force evaluation of the NN-body problem is implemented in ``Compute Unified Device Architecture'' (CUDA) using the GPU to speed-up the calculations. We tested the implementation on three different NN-body codes: two direct NN-body integration codes, using the 4th order predictor-corrector Hermite integrator with block time-steps, and one Barnes-Hut treecode, which uses a 2nd order leapfrog integration scheme. The integration of the equations of motions for all codes is performed on the host CPU. We find that for N>512N > 512 particles the GPU outperforms the GRAPE-6Af, if some softening in the force calculation is accepted. Without softening and for very small integration time steps the GRAPE still outperforms the GPU. We conclude that modern GPUs offer an attractive alternative to GRAPE-6Af special purpose hardware. Using the same time-step criterion, the total energy of the NN-body system was conserved better than to one in 10610^6 on the GPU, only about an order of magnitude worse than obtained with GRAPE-6Af. For N \apgt 10^5 the 8800GTX outperforms the host CPU by a factor of about 100 and runs at about the same speed as the GRAPE-6Af.Comment: Accepted for publication in New Astronom

    StePS: A Multi-GPU Cosmological N-body Code for Compactified Simulations

    Get PDF
    We present the multi-GPU realization of the StePS (Stereographically Projected Cosmological Simulations) algorithm with MPI-OpenMP-CUDA hybrid parallelization and nearly ideal scale-out to multiple compute nodes. Our new zoom-in cosmological direct N-body simulation method simulates the infinite universe with unprecedented dynamic range for a given amount of memory and, in contrast to traditional periodic simulations, its fundamental geometry and topology match observations. By using a spherical geometry instead of periodic boundary conditions, and gradually decreasing the mass resolution with radius, our code is capable of running simulations with a few gigaparsecs in diameter and with a mass resolution of ∌109M⊙\sim 10^{9}M_{\odot} in the center in four days on three compute nodes with four GTX 1080Ti GPUs in each. The code can also be used to run extremely fast simulations with reasonable resolution for fitting cosmological parameters. These simulations are useful for prediction needs of large surveys. The StePS code is publicly available for the research community

    Three-dimensional shapelets and an automated classification scheme for dark matter haloes

    Full text link
    We extend the two-dimensional Cartesian shapelet formalism to d-dimensions. Concentrating on the three-dimensional case, we derive shapelet-based equations for the mass, centroid, root-mean-square radius, and components of the quadrupole moment and moment of inertia tensors. Using cosmological N-body simulations as an application domain, we show that three-dimensional shapelets can be used to replicate the complex sub-structure of dark matter halos and demonstrate the basis of an automated classification scheme for halo shapes. We investigate the shapelet decomposition process from an algorithmic viewpoint, and consider opportunities for accelerating the computation of shapelet-based representations using graphics processing units (GPUs).Comment: 19 pages, 11 figures, accepted for publication in MNRA

    Accelerating NBODY6 with Graphics Processing Units

    Full text link
    We describe the use of Graphics Processing Units (GPUs) for speeding up the code NBODY6 which is widely used for direct NN-body simulations. Over the years, the N2N^2 nature of the direct force calculation has proved a barrier for extending the particle number. Following an early introduction of force polynomials and individual time-steps, the calculation cost was first reduced by the introduction of a neighbour scheme. After a decade of GRAPE computers which speeded up the force calculation further, we are now in the era of GPUs where relatively small hardware systems are highly cost-effective. A significant gain in efficiency is achieved by employing the GPU to obtain the so-called regular force which typically involves some 99 percent of the particles, while the remaining local forces are evaluated on the host. However, the latter operation is performed up to 20 times more frequently and may still account for a significant cost. This effort is reduced by parallel SSE/AVX procedures where each interaction term is calculated using mainly single precision. We also discuss further strategies connected with coordinate and velocity prediction required by the integration scheme. This leaves hard binaries and multiple close encounters which are treated by several regularization methods. The present nbody6-GPU code is well balanced for simulations in the particle range 104−2×10510^4-2 \times 10^5 for a dual GPU system attached to a standard PC.Comment: 8 pages, 3 figures, 2 tables, MNRAS accepte

    A new gravitational N-body simulation algorithm for investigation of cosmological chaotic advection

    Full text link
    Recently alternative approaches in cosmology seeks to explain the nature of dark matter as a direct result of the non-linear spacetime curvature due to different types of deformation potentials. In this context, a key test for this hypothesis is to examine the effects of deformation on the evolution of large scales structures. An important requirement for the fine analysis of this pure gravitational signature (without dark matter elements) is to characterize the position of a galaxy during its trajectory to the gravitational collapse of super clusters at low redshifts. In this context, each element in an gravitational N-body simulation behaves as a tracer of collapse governed by the process known as chaotic advection (or lagrangian turbulence). In order to develop a detailed study of this new approach we develop the COsmic LAgrangian TUrbulence Simulator (COLATUS) to perform gravitational N-body simulations based on Compute Unified Device Architecture (CUDA) for graphics processing units (GPUs). In this paper we report the first robust results obtained from COLATUS.Comment: Proceedings of Sixth International School on Field Theory and Gravitation-2012 - by American Institute of Physic
    • 

    corecore