Accurate direct N-body simulations help to obtain detailed information
about the dynamical evolution of star clusters. They also enable comparisons
with analytical models and Fokker-Planck or Monte-Carlo methods. NBODY6 is a
well-known direct N-body code for star clusters, and NBODY6++ is the extended
version designed for large particle number simulations by supercomputers. We
present NBODY6++GPU, an optimized version of NBODY6++ with hybrid
parallelization methods (MPI, GPU, OpenMP, and AVX/SSE) to accelerate large
direct N-body simulations, and in particular to solve the million-body
problem. We discuss the new features of the NBODY6++GPU code, benchmarks, as
well as the first results from a simulation of a realistic globular cluster
initially containing a million particles. For million-body simulations,
NBODY6++GPU is 400−2000 times faster than NBODY6 with 320 CPU cores and 32
NVIDIA K20X GPUs. With this computing cluster specification, the simulations of
million-body globular clusters including 5% primordial binaries require
about an hour per half-mass crossing time.Comment: 13 pages, 9 figures, 3 table