Search CORE

904 research outputs found

Application of graphics processing units to search pipelines for gravitational waves from coalescing binaries of compact objects

Author: Blair David
Cannon Kipp
Chung Shin Kee
Datta Amitava
Wen Linqing
Publication venue: 'AIP Publishing'
Publication date: 07/07/2010
Field of study

We report a novel application of a graphics processing unit (GPU) for the purpose of accelerating the search pipelines for gravitational waves from coalescing binaries of compact objects. A speed-up of 16-fold in total has been achieved with an NVIDIA GeForce 8800 Ultra GPU card compared with one core of a 2.5 GHz Intel Q9300 central processing unit (CPU). We show that substantial improvements are possible and discuss the reduction in CPU count required for the detection of inspiral sources afforded by the use of GPUs

Caltech Authors

A sparse octree gravitational N-body code that runs entirely on the GPU processor

Author: Barnes
Barnes
Barnes
Belleman
Billeter
Buck
Burtscher
de Berg
Dehnen
Dubinski
Evghenii Gaburov
Fukushige
Gaburov
Gaburov
Hamada
Hamada
Harfst
Hut
Jeroen Bédorf
Knuth
Lauterbach
Makino
Makino
McMillan
Nyland
Plummer
Portegies Zwart
Portegies Zwart
Raman
Salmon
Satish
Simon Portegies Zwart
Springel
Warren
Yokota
Publication venue: 'Elsevier BV'
Publication date: 01/04/2012
Field of study

We present parallel algorithms for constructing and traversing sparse octrees on graphics processing units (GPUs). The algorithms are based on parallel-scan and sort methods. To test the performance and feasibility, we implemented them in CUDA in the form of a gravitational tree-code which completely runs on the GPU.(The code is publicly available at: http://castle.strw.leidenuniv.nl/software.html) The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second.Comment: Accepted version. Published in Journal of Computational Physics. 35 pages, 12 figures, single colum

arXiv.org e-Print Archive

Crossref

Leiden University Scholary Publications

Astrophysical Supercomputing with GPUs: Critical Decisions for Early Adopters

Author: Amr H. Hassan
Barsdell
Benjamin R. Barsdell
Christopher J. Fluke
David G. Barnes
Harris
Kirk
Larus
Nyland
Schaaf
Wayth
Publication venue: 'CSIRO Publishing'
Publication date: 26/08/2010
Field of study

General purpose computing on graphics processing units (GPGPU) is dramatically changing the landscape of high performance computing in astronomy. In this paper, we identify and investigate several key decision areas, with a goal of simplyfing the early adoption of GPGPU in astronomy. We consider the merits of OpenCL as an open standard in order to reduce risks associated with coding in a native, vendor-specific programming environment, and present a GPU programming philosophy based on using brute force solutions. We assert that effective use of new GPU-based supercomputing facilities will require a change in approach from astronomers. This will likely include improved programming training, an increased need for software development best-practice through the use of profiling and related optimisation tools, and a greater reliance on third-party code libraries. As with any new technology, those willing to take the risks, and make the investment of time and effort to become early adopters of GPGPU in astronomy, stand to reap great benefits.Comment: 13 pages, 5 figures, accepted for publication in PAS

arXiv.org e-Print Archive

Crossref

Swinburne Research Bank

The GENGA Code: Gravitational Encounters in N-body simulations with GPU Acceleration

Author: Grimm Simon L.
Stadel Joachim G.
Publication venue: 'IOP Publishing'
Publication date: 01/01/2014
Field of study

We describe an open source GPU implementation of a hybrid symplectic N-body integrator, GENGA (Gravitational ENcounters with Gpu Acceleration), designed to integrate planet and planetesimal dynamics in the late stage of planet formation and stability analyses of planetary systems. GENGA uses a hybrid symplectic integrator to handle close encounters with very good energy conservation, which is essential in long-term planetary system integration. We extended the second order hybrid integration scheme to higher orders. The GENGA code supports three simulation modes: Integration of up to 2048 massive bodies, integration with up to a million test particles, or parallel integration of a large number of individual planetary systems. We compare the results of GENGA to Mercury and pkdgrav2 in respect of energy conservation and performance, and find that the energy conservation of GENGA is comparable to Mercury and around two orders of magnitude better than pkdgrav2. GENGA runs up to 30 times faster than Mercury and up to eight times faster than pkdgrav2. GENGA is written in CUDA C and runs on all NVIDIA GPUs with compute capability of at least 2.0.Comment: Accepted by ApJ. 18 pages, 17 figures, 4 table

arXiv.org e-Print Archive

ZORA

Bern Open Repository and Information System (BORIS)

Parallelized Inference for Gravitational-Wave Astronomy

Author: Poole Gregory B.
Smith Rory
Talbot Colm
Thrane Eric
Publication venue: 'American Physical Society (APS)'
Publication date: 01/08/2019
Field of study

Bayesian inference is the workhorse of gravitational-wave astronomy, for example, determining the mass and spins of merging black holes, revealing the neutron star equation of state, and unveiling the population properties of compact binaries. The science enabled by these inferences comes with a computational cost that can limit the questions we are able to answer. This cost is expected to grow. As detectors improve, the detection rate will go up, allowing less time to analyze each event. Improvement in low-frequency sensitivity will yield longer signals, increasing the number of computations per event. The growing number of entries in the transient catalog will drive up the cost of population studies. While Bayesian inference calculations are not entirely parallelizable, key components are embarrassingly parallel: calculating the gravitational waveform and evaluating the likelihood function. Graphical processor units (GPUs) are adept at such parallel calculations. We report on progress porting gravitational-wave inference calculations to GPUs. Using a single code - which takes advantage of GPU architecture if it is available - we compare computation times using modern GPUs (NVIDIA P100) and CPUs (Intel Gold 6140). We demonstrate speed-ups of

\sim 50 \times

for compact binary coalescence gravitational waveform generation and likelihood evaluation and more than

100\times

for population inference within the lifetime of current detectors. Further improvement is likely with continued development. Our python-based code is publicly available and can be used without familiarity with the parallel computing platform, CUDA.Comment: 5 pages, 4 figures, submitted to PRD, code can be found at https://github.com/ColmTalbot/gwpopulation https://github.com/ColmTalbot/GPUCBC https://github.com/ADACS-Australia/ADACS-SS18A-RSmith Add demonstration of improvement in BNS spi

arXiv.org e-Print Archive

Monash University Research Portal

From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

Author: Blazewicz Marek
Brandt Steven R.
Ciznicki Milosz
Hinder Ian
Kierzynka Michal
Koppelman David M.
Löffler Frank
Schnetter Erik
Tao Jian
Publication venue: 'IOS Press'
Publication date: 01/01/2013
Field of study

Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization is based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.Comment: 18 pages, 4 figures, accepted for publication in Scientific Programmin

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

Louisiana State University

MPG.PuRe

Parallel Algorithm for Solving Kepler's Equation on Graphics Processing Units: Application to Analysis of Doppler Exoplanet Searches

Author: Belleman
Eric B. Ford
Ford
Ford
Gregory
Harris
Kahan
Portegies Zwart
ter Braak
Publication venue: 'Elsevier BV'
Publication date: 16/12/2008
Field of study

[Abridged] We present the results of a highly parallel Kepler equation solver using the Graphics Processing Unit (GPU) on a commercial nVidia GeForce 280GTX and the "Compute Unified Device Architecture" programming environment. We apply this to evaluate a goodness-of-fit statistic (e.g., chi^2) for Doppler observations of stars potentially harboring multiple planetary companions (assuming negligible planet-planet interactions). We tested multiple implementations using single precision, double precision, pairs of single precision, and mixed precision arithmetic. We find that the vast majority of computations can be performed using single precision arithmetic, with selective use of compensated summation for increased precision. However, standard single precision is not adequate for calculating the mean anomaly from the time of observation and orbital period when evaluating the goodness-of-fit for real planetary systems and observational data sets. Using all double precision, our GPU code outperforms a similar code using a modern CPU by a factor of over 60. Using mixed-precision, our GPU code provides a speed-up factor of over 600, when evaluating N_sys > 1024 models planetary systems each containing N_pl = 4 planets and assuming N_obs = 256 observations of each system. We conclude that modern GPUs also offer a powerful tool for repeatedly evaluating Kepler's equation and a goodness-of-fit statistic for orbital models when presented with a large parameter space.Comment: 19 pages, to appear in New Astronom

arXiv.org e-Print Archive

Crossref