Search CORE

27,683 research outputs found

NBODY6++GPU: Ready for the gravitational million-body problem

Author: Aarseth Sverre
Berczik Peter
Kouwenhoven M. B. N.
Naab Thorsten
Nitadori Keigo
Spurzem Rainer
Wang Long
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2015
Field of study

Accurate direct

N

-body simulations help to obtain detailed information about the dynamical evolution of star clusters. They also enable comparisons with analytical models and Fokker-Planck or Monte-Carlo methods. NBODY6 is a well-known direct

N

-body code for star clusters, and NBODY6++ is the extended version designed for large particle number simulations by supercomputers. We present NBODY6++GPU, an optimized version of NBODY6++ with hybrid parallelization methods (MPI, GPU, OpenMP, and AVX/SSE) to accelerate large direct

N

-body simulations, and in particular to solve the million-body problem. We discuss the new features of the NBODY6++GPU code, benchmarks, as well as the first results from a simulation of a realistic globular cluster initially containing a million particles. For million-body simulations, NBODY6++GPU is

400-2000

times faster than NBODY6 with 320 CPU cores and 32 NVIDIA K20X GPUs. With this computing cluster specification, the simulations of million-body globular clusters including

5\%

primordial binaries require about an hour per half-mass crossing time.Comment: 13 pages, 9 figures, 3 table

arXiv.org e-Print Archive

MPG.PuRe

Ambisonic audio system optimization using a HPC cluster

Author: Mair Quentin
Moore David
Wakefield Jonathan
Publication venue
Publication date: 01/01/2011
Field of study

ResearchOnline@GCU

Performance analysis of parallel gravitational $N$ -body codes on large GPU cluster

Author: Berczik Peter
Huang Siyi
Spurzem Rainer
Publication venue: 'IOP Publishing'
Publication date: 11/08/2015
Field of study

We compare the performance of two very different parallel gravitational

N

-body codes for astrophysical simulations on large GPU clusters, both pioneer in their own fields as well as in certain mutual scales - NBODY6++ and Bonsai. We carry out the benchmark of the two codes by analyzing their performance, accuracy and efficiency through the modeling of structure decomposition and timing measurements. We find that both codes are heavily optimized to leverage the computational potential of GPUs as their performance has approached half of the maximum single precision performance of the underlying GPU cards. With such performance we predict that a speed-up of

200-300

can be achieved when up to 1k processors and GPUs are employed simultaneously. We discuss the quantitative information about comparisons of two codes, finding that in the same cases Bonsai adopts larger time steps as well as relative energy errors than NBODY6++, typically ranging from

10-50

times larger, depending on the chosen parameters of the codes. While the two codes are built for different astrophysical applications, in specified conditions they may overlap in performance at certain physical scale, and thus allowing the user to choose from either one with finetuned parameters accordingly.Comment: 15 pages, 7 figures, 3 tables, accepted for publication in Research in Astronomy and Astrophysics (RAA

arXiv.org e-Print Archive

GLB: Lifeline-based Global Load Balancing library in X10

Author: Grove David
Herta Benjamin
Kamada Tomio
Saraswat Vijay
Takeuchi Mikio
Tardieu Olivier
Zhang Wei
Publication venue
Publication date: 19/12/2013
Field of study

We present GLB, a programming model and an associated implementation that can handle a wide range of irregular paral- lel programming problems running over large-scale distributed systems. GLB is applicable both to problems that are easily load-balanced via static scheduling and to problems that are hard to statically load balance. GLB hides the intricate syn- chronizations (e.g., inter-node communication, initialization and startup, load balancing, termination and result collection) from the users. GLB internally uses a version of the lifeline graph based work-stealing algorithm proposed by Saraswat et al. Users of GLB are simply required to write several pieces of sequential code that comply with the GLB interface. GLB then schedules and orchestrates the parallel execution of the code correctly and efficiently at scale. We have applied GLB to two representative benchmarks: Betweenness Centrality (BC) and Unbalanced Tree Search (UTS). Among them, BC can be statically load-balanced whereas UTS cannot. In either case, GLB scales well-- achieving nearly linear speedup on different computer architectures (Power, Blue Gene/Q, and K) -- up to 16K cores

arXiv.org e-Print Archive

CiteSeerX