27,683 research outputs found

    NBODY6++GPU: Ready for the gravitational million-body problem

    Full text link
    Accurate direct NN-body simulations help to obtain detailed information about the dynamical evolution of star clusters. They also enable comparisons with analytical models and Fokker-Planck or Monte-Carlo methods. NBODY6 is a well-known direct NN-body code for star clusters, and NBODY6++ is the extended version designed for large particle number simulations by supercomputers. We present NBODY6++GPU, an optimized version of NBODY6++ with hybrid parallelization methods (MPI, GPU, OpenMP, and AVX/SSE) to accelerate large direct NN-body simulations, and in particular to solve the million-body problem. We discuss the new features of the NBODY6++GPU code, benchmarks, as well as the first results from a simulation of a realistic globular cluster initially containing a million particles. For million-body simulations, NBODY6++GPU is 400−2000400-2000 times faster than NBODY6 with 320 CPU cores and 32 NVIDIA K20X GPUs. With this computing cluster specification, the simulations of million-body globular clusters including 5%5\% primordial binaries require about an hour per half-mass crossing time.Comment: 13 pages, 9 figures, 3 table

    Performance analysis of parallel gravitational NN-body codes on large GPU cluster

    Full text link
    We compare the performance of two very different parallel gravitational NN-body codes for astrophysical simulations on large GPU clusters, both pioneer in their own fields as well as in certain mutual scales - NBODY6++ and Bonsai. We carry out the benchmark of the two codes by analyzing their performance, accuracy and efficiency through the modeling of structure decomposition and timing measurements. We find that both codes are heavily optimized to leverage the computational potential of GPUs as their performance has approached half of the maximum single precision performance of the underlying GPU cards. With such performance we predict that a speed-up of 200−300200-300 can be achieved when up to 1k processors and GPUs are employed simultaneously. We discuss the quantitative information about comparisons of two codes, finding that in the same cases Bonsai adopts larger time steps as well as relative energy errors than NBODY6++, typically ranging from 10−5010-50 times larger, depending on the chosen parameters of the codes. While the two codes are built for different astrophysical applications, in specified conditions they may overlap in performance at certain physical scale, and thus allowing the user to choose from either one with finetuned parameters accordingly.Comment: 15 pages, 7 figures, 3 tables, accepted for publication in Research in Astronomy and Astrophysics (RAA

    GLB: Lifeline-based Global Load Balancing library in X10

    Full text link
    We present GLB, a programming model and an associated implementation that can handle a wide range of irregular paral- lel programming problems running over large-scale distributed systems. GLB is applicable both to problems that are easily load-balanced via static scheduling and to problems that are hard to statically load balance. GLB hides the intricate syn- chronizations (e.g., inter-node communication, initialization and startup, load balancing, termination and result collection) from the users. GLB internally uses a version of the lifeline graph based work-stealing algorithm proposed by Saraswat et al. Users of GLB are simply required to write several pieces of sequential code that comply with the GLB interface. GLB then schedules and orchestrates the parallel execution of the code correctly and efficiently at scale. We have applied GLB to two representative benchmarks: Betweenness Centrality (BC) and Unbalanced Tree Search (UTS). Among them, BC can be statically load-balanced whereas UTS cannot. In either case, GLB scales well-- achieving nearly linear speedup on different computer architectures (Power, Blue Gene/Q, and K) -- up to 16K cores
    • …
    corecore