904 research outputs found

    Application of graphics processing units to search pipelines for gravitational waves from coalescing binaries of compact objects

    Get PDF
    We report a novel application of a graphics processing unit (GPU) for the purpose of accelerating the search pipelines for gravitational waves from coalescing binaries of compact objects. A speed-up of 16-fold in total has been achieved with an NVIDIA GeForce 8800 Ultra GPU card compared with one core of a 2.5 GHz Intel Q9300 central processing unit (CPU). We show that substantial improvements are possible and discuss the reduction in CPU count required for the detection of inspiral sources afforded by the use of GPUs

    A sparse octree gravitational N-body code that runs entirely on the GPU processor

    Get PDF
    We present parallel algorithms for constructing and traversing sparse octrees on graphics processing units (GPUs). The algorithms are based on parallel-scan and sort methods. To test the performance and feasibility, we implemented them in CUDA in the form of a gravitational tree-code which completely runs on the GPU.(The code is publicly available at: http://castle.strw.leidenuniv.nl/software.html) The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second.Comment: Accepted version. Published in Journal of Computational Physics. 35 pages, 12 figures, single colum

    Astrophysical Supercomputing with GPUs: Critical Decisions for Early Adopters

    Full text link
    General purpose computing on graphics processing units (GPGPU) is dramatically changing the landscape of high performance computing in astronomy. In this paper, we identify and investigate several key decision areas, with a goal of simplyfing the early adoption of GPGPU in astronomy. We consider the merits of OpenCL as an open standard in order to reduce risks associated with coding in a native, vendor-specific programming environment, and present a GPU programming philosophy based on using brute force solutions. We assert that effective use of new GPU-based supercomputing facilities will require a change in approach from astronomers. This will likely include improved programming training, an increased need for software development best-practice through the use of profiling and related optimisation tools, and a greater reliance on third-party code libraries. As with any new technology, those willing to take the risks, and make the investment of time and effort to become early adopters of GPGPU in astronomy, stand to reap great benefits.Comment: 13 pages, 5 figures, accepted for publication in PAS

    The GENGA Code: Gravitational Encounters in N-body simulations with GPU Acceleration

    Full text link
    We describe an open source GPU implementation of a hybrid symplectic N-body integrator, GENGA (Gravitational ENcounters with Gpu Acceleration), designed to integrate planet and planetesimal dynamics in the late stage of planet formation and stability analyses of planetary systems. GENGA uses a hybrid symplectic integrator to handle close encounters with very good energy conservation, which is essential in long-term planetary system integration. We extended the second order hybrid integration scheme to higher orders. The GENGA code supports three simulation modes: Integration of up to 2048 massive bodies, integration with up to a million test particles, or parallel integration of a large number of individual planetary systems. We compare the results of GENGA to Mercury and pkdgrav2 in respect of energy conservation and performance, and find that the energy conservation of GENGA is comparable to Mercury and around two orders of magnitude better than pkdgrav2. GENGA runs up to 30 times faster than Mercury and up to eight times faster than pkdgrav2. GENGA is written in CUDA C and runs on all NVIDIA GPUs with compute capability of at least 2.0.Comment: Accepted by ApJ. 18 pages, 17 figures, 4 table

    Parallelized Inference for Gravitational-Wave Astronomy

    Full text link
    Bayesian inference is the workhorse of gravitational-wave astronomy, for example, determining the mass and spins of merging black holes, revealing the neutron star equation of state, and unveiling the population properties of compact binaries. The science enabled by these inferences comes with a computational cost that can limit the questions we are able to answer. This cost is expected to grow. As detectors improve, the detection rate will go up, allowing less time to analyze each event. Improvement in low-frequency sensitivity will yield longer signals, increasing the number of computations per event. The growing number of entries in the transient catalog will drive up the cost of population studies. While Bayesian inference calculations are not entirely parallelizable, key components are embarrassingly parallel: calculating the gravitational waveform and evaluating the likelihood function. Graphical processor units (GPUs) are adept at such parallel calculations. We report on progress porting gravitational-wave inference calculations to GPUs. Using a single code - which takes advantage of GPU architecture if it is available - we compare computation times using modern GPUs (NVIDIA P100) and CPUs (Intel Gold 6140). We demonstrate speed-ups of 50×\sim 50 \times for compact binary coalescence gravitational waveform generation and likelihood evaluation and more than 100×100\times for population inference within the lifetime of current detectors. Further improvement is likely with continued development. Our python-based code is publicly available and can be used without familiarity with the parallel computing platform, CUDA.Comment: 5 pages, 4 figures, submitted to PRD, code can be found at https://github.com/ColmTalbot/gwpopulation https://github.com/ColmTalbot/GPUCBC https://github.com/ADACS-Australia/ADACS-SS18A-RSmith Add demonstration of improvement in BNS spi

    From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

    Full text link
    Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization is based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.Comment: 18 pages, 4 figures, accepted for publication in Scientific Programmin

    Parallel Algorithm for Solving Kepler's Equation on Graphics Processing Units: Application to Analysis of Doppler Exoplanet Searches

    Full text link
    [Abridged] We present the results of a highly parallel Kepler equation solver using the Graphics Processing Unit (GPU) on a commercial nVidia GeForce 280GTX and the "Compute Unified Device Architecture" programming environment. We apply this to evaluate a goodness-of-fit statistic (e.g., chi^2) for Doppler observations of stars potentially harboring multiple planetary companions (assuming negligible planet-planet interactions). We tested multiple implementations using single precision, double precision, pairs of single precision, and mixed precision arithmetic. We find that the vast majority of computations can be performed using single precision arithmetic, with selective use of compensated summation for increased precision. However, standard single precision is not adequate for calculating the mean anomaly from the time of observation and orbital period when evaluating the goodness-of-fit for real planetary systems and observational data sets. Using all double precision, our GPU code outperforms a similar code using a modern CPU by a factor of over 60. Using mixed-precision, our GPU code provides a speed-up factor of over 600, when evaluating N_sys > 1024 models planetary systems each containing N_pl = 4 planets and assuming N_obs = 256 observations of each system. We conclude that modern GPUs also offer a powerful tool for repeatedly evaluating Kepler's equation and a goodness-of-fit statistic for orbital models when presented with a large parameter space.Comment: 19 pages, to appear in New Astronom
    corecore