305 research outputs found
GPU Accelerated Particle Visualization with Splotch
Splotch is a rendering algorithm for exploration and visual discovery in
particle-based datasets coming from astronomical observations or numerical
simulations. The strengths of the approach are production of high quality
imagery and support for very large-scale datasets through an effective mix of
the OpenMP and MPI parallel programming paradigms. This article reports our
experiences in re-designing Splotch for exploiting emerging HPC architectures
nowadays increasingly populated with GPUs. A performance model is introduced
for data transfers, computations and memory access, to guide our re-factoring
of Splotch. A number of parallelization issues are discussed, in particular
relating to race conditions and workload balancing, towards achieving optimal
performances. Our implementation was accomplished by using the CUDA programming
paradigm. Our strategy is founded on novel schemes achieving optimized data
organisation and classification of particles. We deploy a reference simulation
to present performance results on acceleration gains and scalability. We
finally outline our vision for future work developments including possibilities
for further optimisations and exploitation of emerging technologies.Comment: 25 pages, 9 figures. Astronomy and Computing (2014
Using hybrid GPU/CPU kernel splitting to accelerate spherical convolutions
We present a general method for accelerating by more than an order of
magnitude the convolution of pixelated functions on the sphere with a
radially-symmetric kernel. Our method splits the kernel into a compact
real-space component and a compact spherical harmonic space component. These
components can then be convolved in parallel using an inexpensive commodity GPU
and a CPU. We provide models for the computational cost of both real-space and
Fourier space convolutions and an estimate for the approximation error. Using
these models we can determine the optimum split that minimizes the wall clock
time for the convolution while satisfying the desired error bounds. We apply
this technique to the problem of simulating a cosmic microwave background (CMB)
anisotropy sky map at the resolution typical of the high resolution maps
produced by the Planck mission. For the main Planck CMB science channels we
achieve a speedup of over a factor of ten, assuming an acceptable fractional
rms error of order 1.e-5 in the power spectrum of the output map.Comment: 9 pages, 11 figures, 1 table, accepted by Astronomy & Computing w/
minor revisions. arXiv admin note: substantial text overlap with
arXiv:1211.355
GAMER: a GPU-Accelerated Adaptive Mesh Refinement Code for Astrophysics
We present the newly developed code, GAMER (GPU-accelerated Adaptive MEsh
Refinement code), which has adopted a novel approach to improve the performance
of adaptive mesh refinement (AMR) astrophysical simulations by a large factor
with the use of the graphic processing unit (GPU). The AMR implementation is
based on a hierarchy of grid patches with an oct-tree data structure. We adopt
a three-dimensional relaxing TVD scheme for the hydrodynamic solver, and a
multi-level relaxation scheme for the Poisson solver. Both solvers have been
implemented in GPU, by which hundreds of patches can be advanced in parallel.
The computational overhead associated with the data transfer between CPU and
GPU is carefully reduced by utilizing the capability of asynchronous memory
copies in GPU, and the computing time of the ghost-zone values for each patch
is made to diminish by overlapping it with the GPU computations. We demonstrate
the accuracy of the code by performing several standard test problems in
astrophysics. GAMER is a parallel code that can be run in a multi-GPU cluster
system. We measure the performance of the code by performing purely-baryonic
cosmological simulations in different hardware implementations, in which
detailed timing analyses provide comparison between the computations with and
without GPU(s) acceleration. Maximum speed-up factors of 12.19 and 10.47 are
demonstrated using 1 GPU with 4096^3 effective resolution and 16 GPUs with
8192^3 effective resolution, respectively.Comment: 60 pages, 22 figures, 3 tables. More accuracy tests are included.
Accepted for publication in ApJ
Computational Physics on Graphics Processing Units
The use of graphics processing units for scientific computations is an
emerging strategy that can significantly speed up various different algorithms.
In this review, we discuss advances made in the field of computational physics,
focusing on classical molecular dynamics, and on quantum simulations for
electronic structure calculations using the density functional theory, wave
function techniques, and quantum field theory.Comment: Proceedings of the 11th International Conference, PARA 2012,
Helsinki, Finland, June 10-13, 201
ASCR/HEP Exascale Requirements Review Report
This draft report summarizes and details the findings, results, and
recommendations derived from the ASCR/HEP Exascale Requirements Review meeting
held in June, 2015. The main conclusions are as follows. 1) Larger, more
capable computing and data facilities are needed to support HEP science goals
in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of
the demand at the 2025 timescale is at least two orders of magnitude -- and in
some cases greater -- than that available currently. 2) The growth rate of data
produced by simulations is overwhelming the current ability, of both facilities
and researchers, to store and analyze it. Additional resources and new
techniques for data analysis are urgently needed. 3) Data rates and volumes
from HEP experimental facilities are also straining the ability to store and
analyze large and complex data volumes. Appropriately configured
leadership-class facilities can play a transformational role in enabling
scientific discovery from these datasets. 4) A close integration of HPC
simulation and data analysis will aid greatly in interpreting results from HEP
experiments. Such an integration will minimize data movement and facilitate
interdependent workflows. 5) Long-range planning between HEP and ASCR will be
required to meet HEP's research needs. To best use ASCR HPC resources the
experimental HEP program needs a) an established long-term plan for access to
ASCR computational and data resources, b) an ability to map workflows onto HPC
resources, c) the ability for ASCR facilities to accommodate workflows run by
collaborations that can have thousands of individual members, d) to transition
codes to the next-generation HPC platforms that will be available at ASCR
facilities, e) to build up and train a workforce capable of developing and
using simulations and analysis to support HEP scientific research on
next-generation systems.Comment: 77 pages, 13 Figures; draft report, subject to further revisio
- …