20,715 research outputs found

    A COMPARISON OF PRECONDITIONING TECHNIQUES FOR PARALLELIZED PCG SOLVERS FOR THE CELL-CENTERED FINITE-DIFFERENCE PROBLEM

    Get PDF
    Abstract This paper reports on a parallelization of the preconditioned conjugate gradient algorithm for sparse, symmetric matrices. Parallelization is based in domain partitioning into non-overlapping subdomains; the resulting parallelized algorithm is briefly described. Comparisons are made between three block preconditioners commonly used in the parallelization of the preconditoned conjugate gradient methods: Jacobi, incomplete Choleski, and Gauss Seidel. Basic timing and iteration results for these preconditioners are presented; these results tentatively indicate that the simpler block Jacobi algorithm is as efficient as the more complex block incomplete Cholesky and block Gauss Seidel

    NBODY6++GPU: Ready for the gravitational million-body problem

    Full text link
    Accurate direct NN-body simulations help to obtain detailed information about the dynamical evolution of star clusters. They also enable comparisons with analytical models and Fokker-Planck or Monte-Carlo methods. NBODY6 is a well-known direct NN-body code for star clusters, and NBODY6++ is the extended version designed for large particle number simulations by supercomputers. We present NBODY6++GPU, an optimized version of NBODY6++ with hybrid parallelization methods (MPI, GPU, OpenMP, and AVX/SSE) to accelerate large direct NN-body simulations, and in particular to solve the million-body problem. We discuss the new features of the NBODY6++GPU code, benchmarks, as well as the first results from a simulation of a realistic globular cluster initially containing a million particles. For million-body simulations, NBODY6++GPU is 4002000400-2000 times faster than NBODY6 with 320 CPU cores and 32 NVIDIA K20X GPUs. With this computing cluster specification, the simulations of million-body globular clusters including 5%5\% primordial binaries require about an hour per half-mass crossing time.Comment: 13 pages, 9 figures, 3 table

    Parallelization Strategies for Density Matrix Renormalization Group Algorithms on Shared-Memory Systems

    Full text link
    Shared-memory parallelization (SMP) strategies for density matrix renormalization group (DMRG) algorithms enable the treatment of complex systems in solid state physics. We present two different approaches by which parallelization of the standard DMRG algorithm can be accomplished in an efficient way. The methods are illustrated with DMRG calculations of the two-dimensional Hubbard model and the one-dimensional Holstein-Hubbard model on contemporary SMP architectures. The parallelized code shows good scalability up to at least eight processors and allows us to solve problems which exceed the capability of sequential DMRG calculations.Comment: 18 pages, 9 figure
    corecore