2,538 research outputs found
NBODY6++GPU: Ready for the gravitational million-body problem
Accurate direct -body simulations help to obtain detailed information
about the dynamical evolution of star clusters. They also enable comparisons
with analytical models and Fokker-Planck or Monte-Carlo methods. NBODY6 is a
well-known direct -body code for star clusters, and NBODY6++ is the extended
version designed for large particle number simulations by supercomputers. We
present NBODY6++GPU, an optimized version of NBODY6++ with hybrid
parallelization methods (MPI, GPU, OpenMP, and AVX/SSE) to accelerate large
direct -body simulations, and in particular to solve the million-body
problem. We discuss the new features of the NBODY6++GPU code, benchmarks, as
well as the first results from a simulation of a realistic globular cluster
initially containing a million particles. For million-body simulations,
NBODY6++GPU is times faster than NBODY6 with 320 CPU cores and 32
NVIDIA K20X GPUs. With this computing cluster specification, the simulations of
million-body globular clusters including primordial binaries require
about an hour per half-mass crossing time.Comment: 13 pages, 9 figures, 3 table
Gauge Field Generation on Large-Scale GPU-Enabled Systems
Over the past years GPUs have been successfully applied to the task of
inverting the fermion matrix in lattice QCD calculations. Even strong scaling
to capability-level supercomputers, corresponding to O(100) GPUs or more has
been achieved. However strong scaling a whole gauge field generation algorithm
to this regim requires significantly more functionality than just having the
matrix inverter utilizing the GPUs and has not yet been accomplished. This
contribution extends QDP-JIT, the migration of SciDAC QDP++ to GPU-enabled
parallel systems, to help to strong scale the whole Hybrid Monte-Carlo to this
regime. Initial results are shown for gauge field generation with Chroma
simulating pure Wilson fermions on OLCF TitanDev.Comment: The 30th International Symposium on Lattice Field Theory, June 24-29,
2012, Cairns, Australia (Acknowledgment and Citation added
OpenACC Based GPU Parallelization of Plane Sweep Algorithm for Geometric Intersection
Line segment intersection is one of the elementary operations in computational geometry. Complex problems in Geographic Information Systems (GIS) like finding map overlays or spatial joins using polygonal data require solving segment intersections. Plane sweep paradigm is used for finding geometric intersection in an efficient manner. However, it is difficult to parallelize due to its in-order processing of spatial events. We present a new fine-grained parallel algorithm for geometric intersection and its CPU and GPU implementation using OpenMP and OpenACC. To the best of our knowledge, this is the first work demonstrating an effective parallelization of plane sweep on GPUs.
We chose compiler directive based approach for implementation because of its simplicity to parallelize sequential code. Using Nvidia Tesla P100 GPU, our implementation achieves around 40X speedup for line segment intersection problem on 40K and 80K data sets compared to sequential CGAL library
Real-Time Dedispersion for Fast Radio Transient Surveys, using Auto Tuning on Many-Core Accelerators
Dedispersion, the removal of deleterious smearing of impulsive signals by the
interstellar matter, is one of the most intensive processing steps in any radio
survey for pulsars and fast transients. We here present a study of the
parallelization of this algorithm on many-core accelerators, including GPUs
from AMD and NVIDIA, and the Intel Xeon Phi. We find that dedispersion is
inherently memory-bound. Even in a perfect scenario, hardware limitations keep
the arithmetic intensity low, thus limiting performance. We next exploit
auto-tuning to adapt dedispersion to different accelerators, observations, and
even telescopes. We demonstrate that the optimal settings differ between
observational setups, and that auto-tuning significantly improves performance.
This impacts time-domain surveys from Apertif to SKA.Comment: 8 pages, accepted for publication in Astronomy and Computin
- …