904 research outputs found
Application of graphics processing units to search pipelines for gravitational waves from coalescing binaries of compact objects
We report a novel application of a graphics processing unit (GPU) for the purpose of accelerating the search pipelines for gravitational waves from coalescing binaries of compact objects. A speed-up of 16-fold in total has been achieved with an NVIDIA GeForce 8800 Ultra GPU card compared with one core of a 2.5 GHz Intel Q9300 central processing unit (CPU). We show that substantial improvements are possible and discuss the reduction in CPU count required for the detection of inspiral sources afforded by the use of GPUs
A sparse octree gravitational N-body code that runs entirely on the GPU processor
We present parallel algorithms for constructing and traversing sparse octrees
on graphics processing units (GPUs). The algorithms are based on parallel-scan
and sort methods. To test the performance and feasibility, we implemented them
in CUDA in the form of a gravitational tree-code which completely runs on the
GPU.(The code is publicly available at:
http://castle.strw.leidenuniv.nl/software.html) The tree construction and
traverse algorithms are portable to many-core devices which have support for
CUDA or OpenCL programming languages. The gravitational tree-code outperforms
tuned CPU code during the tree-construction and shows a performance improvement
of more than a factor 20 overall, resulting in a processing rate of more than
2.8 million particles per second.Comment: Accepted version. Published in Journal of Computational Physics. 35
pages, 12 figures, single colum
Astrophysical Supercomputing with GPUs: Critical Decisions for Early Adopters
General purpose computing on graphics processing units (GPGPU) is
dramatically changing the landscape of high performance computing in astronomy.
In this paper, we identify and investigate several key decision areas, with a
goal of simplyfing the early adoption of GPGPU in astronomy. We consider the
merits of OpenCL as an open standard in order to reduce risks associated with
coding in a native, vendor-specific programming environment, and present a GPU
programming philosophy based on using brute force solutions. We assert that
effective use of new GPU-based supercomputing facilities will require a change
in approach from astronomers. This will likely include improved programming
training, an increased need for software development best-practice through the
use of profiling and related optimisation tools, and a greater reliance on
third-party code libraries. As with any new technology, those willing to take
the risks, and make the investment of time and effort to become early adopters
of GPGPU in astronomy, stand to reap great benefits.Comment: 13 pages, 5 figures, accepted for publication in PAS
The GENGA Code: Gravitational Encounters in N-body simulations with GPU Acceleration
We describe an open source GPU implementation of a hybrid symplectic N-body
integrator, GENGA (Gravitational ENcounters with Gpu Acceleration), designed to
integrate planet and planetesimal dynamics in the late stage of planet
formation and stability analyses of planetary systems. GENGA uses a hybrid
symplectic integrator to handle close encounters with very good energy
conservation, which is essential in long-term planetary system integration. We
extended the second order hybrid integration scheme to higher orders. The GENGA
code supports three simulation modes: Integration of up to 2048 massive bodies,
integration with up to a million test particles, or parallel integration of a
large number of individual planetary systems. We compare the results of GENGA
to Mercury and pkdgrav2 in respect of energy conservation and performance, and
find that the energy conservation of GENGA is comparable to Mercury and around
two orders of magnitude better than pkdgrav2. GENGA runs up to 30 times faster
than Mercury and up to eight times faster than pkdgrav2. GENGA is written in
CUDA C and runs on all NVIDIA GPUs with compute capability of at least 2.0.Comment: Accepted by ApJ. 18 pages, 17 figures, 4 table
Parallelized Inference for Gravitational-Wave Astronomy
Bayesian inference is the workhorse of gravitational-wave astronomy, for
example, determining the mass and spins of merging black holes, revealing the
neutron star equation of state, and unveiling the population properties of
compact binaries. The science enabled by these inferences comes with a
computational cost that can limit the questions we are able to answer. This
cost is expected to grow. As detectors improve, the detection rate will go up,
allowing less time to analyze each event. Improvement in low-frequency
sensitivity will yield longer signals, increasing the number of computations
per event. The growing number of entries in the transient catalog will drive up
the cost of population studies. While Bayesian inference calculations are not
entirely parallelizable, key components are embarrassingly parallel:
calculating the gravitational waveform and evaluating the likelihood function.
Graphical processor units (GPUs) are adept at such parallel calculations. We
report on progress porting gravitational-wave inference calculations to GPUs.
Using a single code - which takes advantage of GPU architecture if it is
available - we compare computation times using modern GPUs (NVIDIA P100) and
CPUs (Intel Gold 6140). We demonstrate speed-ups of for
compact binary coalescence gravitational waveform generation and likelihood
evaluation and more than for population inference within the
lifetime of current detectors. Further improvement is likely with continued
development. Our python-based code is publicly available and can be used
without familiarity with the parallel computing platform, CUDA.Comment: 5 pages, 4 figures, submitted to PRD, code can be found at
https://github.com/ColmTalbot/gwpopulation
https://github.com/ColmTalbot/GPUCBC
https://github.com/ADACS-Australia/ADACS-SS18A-RSmith Add demonstration of
improvement in BNS spi
From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation
Starting from a high-level problem description in terms of partial
differential equations using abstract tensor notation, the Chemora framework
discretizes, optimizes, and generates complete high performance codes for a
wide range of compute architectures. Chemora extends the capabilities of
Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient
manner for complex applications, without low-level code tuning. Chemora
achieves parallelism through MPI and multi-threading, combining OpenMP and
CUDA. Optimizations include high-level code transformations, efficient loop
traversal strategies, dynamically selected data and instruction cache usage
strategies, and JIT compilation of GPU code tailored to the problem
characteristics. The discretization is based on higher-order finite differences
on multi-block domains. Chemora's capabilities are demonstrated by simulations
of black hole collisions. This problem provides an acid test of the framework,
as the Einstein equations contain hundreds of variables and thousands of terms.Comment: 18 pages, 4 figures, accepted for publication in Scientific
Programmin
Parallel Algorithm for Solving Kepler's Equation on Graphics Processing Units: Application to Analysis of Doppler Exoplanet Searches
[Abridged] We present the results of a highly parallel Kepler equation solver
using the Graphics Processing Unit (GPU) on a commercial nVidia GeForce 280GTX
and the "Compute Unified Device Architecture" programming environment. We apply
this to evaluate a goodness-of-fit statistic (e.g., chi^2) for Doppler
observations of stars potentially harboring multiple planetary companions
(assuming negligible planet-planet interactions). We tested multiple
implementations using single precision, double precision, pairs of single
precision, and mixed precision arithmetic. We find that the vast majority of
computations can be performed using single precision arithmetic, with selective
use of compensated summation for increased precision. However, standard single
precision is not adequate for calculating the mean anomaly from the time of
observation and orbital period when evaluating the goodness-of-fit for real
planetary systems and observational data sets. Using all double precision, our
GPU code outperforms a similar code using a modern CPU by a factor of over 60.
Using mixed-precision, our GPU code provides a speed-up factor of over 600,
when evaluating N_sys > 1024 models planetary systems each containing N_pl = 4
planets and assuming N_obs = 256 observations of each system. We conclude that
modern GPUs also offer a powerful tool for repeatedly evaluating Kepler's
equation and a goodness-of-fit statistic for orbital models when presented with
a large parameter space.Comment: 19 pages, to appear in New Astronom
- …