Search CORE

20,715 research outputs found

A COMPARISON OF PRECONDITIONING TECHNIQUES FOR PARALLELIZED PCG SOLVERS FOR THE CELL-CENTERED FINITE-DIFFERENCE PROBLEM

Author: John D Wilson
Richard L Naff
Publication venue
Publication date: 03/04/2020
Field of study

Abstract This paper reports on a parallelization of the preconditioned conjugate gradient algorithm for sparse, symmetric matrices. Parallelization is based in domain partitioning into non-overlapping subdomains; the resulting parallelized algorithm is briefly described. Comparisons are made between three block preconditioners commonly used in the parallelization of the preconditoned conjugate gradient methods: Jacobi, incomplete Choleski, and Gauss Seidel. Basic timing and iteration results for these preconditioners are presented; these results tentatively indicate that the simpler block Jacobi algorithm is as efficient as the more complex block incomplete Cholesky and block Gauss Seidel

CiteSeerX

NBODY6++GPU: Ready for the gravitational million-body problem

Author: Aarseth Sverre
Berczik Peter
Kouwenhoven M. B. N.
Naab Thorsten
Nitadori Keigo
Spurzem Rainer
Wang Long
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2015
Field of study

Accurate direct

N

-body simulations help to obtain detailed information about the dynamical evolution of star clusters. They also enable comparisons with analytical models and Fokker-Planck or Monte-Carlo methods. NBODY6 is a well-known direct

N

-body code for star clusters, and NBODY6++ is the extended version designed for large particle number simulations by supercomputers. We present NBODY6++GPU, an optimized version of NBODY6++ with hybrid parallelization methods (MPI, GPU, OpenMP, and AVX/SSE) to accelerate large direct

N

-body simulations, and in particular to solve the million-body problem. We discuss the new features of the NBODY6++GPU code, benchmarks, as well as the first results from a simulation of a realistic globular cluster initially containing a million particles. For million-body simulations, NBODY6++GPU is

400-2000

times faster than NBODY6 with 320 CPU cores and 32 NVIDIA K20X GPUs. With this computing cluster specification, the simulations of million-body globular clusters including

5\%

primordial binaries require about an hour per half-mass crossing time.Comment: 13 pages, 9 figures, 3 table

arXiv.org e-Print Archive

Crossref

Repository of the Academy's Library

MPG.PuRe

Parallelization Strategies for Density Matrix Renormalization Group Algorithms on Shared-Memory Systems

Author: E. Jeckelmann
Fehske
G. Hager
G. Wellein
Goedecker
Gutzwiller
H. Fehske
Holstein
Hubbard
Jeckelmann
Kanamori
Nishimoto
Wellein
White
White
Publication venue: 'Elsevier BV'
Publication date: 20/05/2003
Field of study

Shared-memory parallelization (SMP) strategies for density matrix renormalization group (DMRG) algorithms enable the treatment of complex systems in solid state physics. We present two different approaches by which parallelization of the standard DMRG algorithm can be accomplished in an efficient way. The methods are illustrated with DMRG calculations of the two-dimensional Hubbard model and the one-dimensional Holstein-Hubbard model on contemporary SMP architectures. The parallelized code shows good scalability up to at least eight processors and allows us to solve problems which exceed the capability of sequential DMRG calculations.Comment: 18 pages, 9 figure

arXiv.org e-Print Archive

Crossref

On the Parallelization of Vector Fitting Algorithms

Author: Chinea Alessandro
Grivet-Talocia S.
Publication venue: IEEE
Publication date: 01/01/2011
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino