112,264 research outputs found
Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients
We present a robust and scalable preconditioner for the solution of
large-scale linear systems that arise from the discretization of elliptic PDEs
amenable to rank compression. The preconditioner is based on hierarchical
low-rank approximations and the cyclic reduction method. The setup and
application phases of the preconditioner achieve log-linear complexity in
memory footprint and number of operations, and numerical experiments exhibit
good weak and strong scalability at large processor counts in a distributed
memory environment. Numerical experiments with linear systems that feature
symmetry and nonsymmetry, definiteness and indefiniteness, constant and
variable coefficients demonstrate the preconditioner applicability and
robustness. Furthermore, it is possible to control the number of iterations via
the accuracy threshold of the hierarchical matrix approximations and their
arithmetic operations, and the tuning of the admissibility condition parameter.
Together, these parameters allow for optimization of the memory requirements
and performance of the preconditioner.Comment: 24 pages, Elsevier Journal of Computational and Applied Mathematics,
Dec 201
A Direct Elliptic Solver Based on Hierarchically Low-rank Schur Complements
A parallel fast direct solver for rank-compressible block tridiagonal linear
systems is presented. Algorithmic synergies between Cyclic Reduction and
Hierarchical matrix arithmetic operations result in a solver with arithmetic complexity and memory footprint. We provide a
baseline for performance and applicability by comparing with well known
implementations of the -LU factorization and algebraic multigrid
with a parallel implementation that leverages the concurrency features of the
method. Numerical experiments reveal that this method is comparable with other
fast direct solvers based on Hierarchical Matrices such as -LU and
that it can tackle problems where algebraic multigrid fails to converge
Parallel Factorizations in Numerical Analysis
In this paper we review the parallel solution of sparse linear systems,
usually deriving by the discretization of ODE-IVPs or ODE-BVPs. The approach is
based on the concept of parallel factorization of a (block) tridiagonal matrix.
This allows to obtain efficient parallel extensions of many known matrix
factorizations, and to derive, as a by-product, a unifying approach to the
parallel solution of ODEs.Comment: 15 pages, 5 figure
Alternating-Direction Line-Relaxation Methods on Multicomputers
We study the multicom.puter performance of a three-dimensional Navier–Stokes solver based on alternating-direction line-relaxation methods. We compare several multicomputer implementations, each of which combines a particular line-relaxation method and a particular distributed block-tridiagonal solver. In our experiments, the problem size was determined by resolution requirements of the application. As a result, the granularity of the computations of our study is finer than is customary in the performance analysis of concurrent block-tridiagonal solvers. Our best results were obtained with a modified half-Gauss–Seidel line-relaxation method implemented by means of a new iterative block-tridiagonal solver that is developed here. Most computations were performed on the Intel Touchstone Delta, but we also used the Intel Paragon XP/S, the Parsytec SC-256, and the Fujitsu S-600 for comparison
Parallel tridiagonal equation solvers
Three parallel algorithms were compared for the direct solution of tridiagonal linear systems of equations. The algorithms are suitable for computers such as ILLIAC 4 and CDC STAR. For array computers similar to ILLIAC 4, cyclic odd-even reduction has the least operation count for highly structured sets of equations, and recursive doubling has the least count for relatively unstructured sets of equations. Since the difference in operation counts for these two algorithms is not substantial, their relative running times may be more related to overhead operations, which are not measured in this paper. The third algorithm, based on Buneman's Poisson solver, has more arithmetic operations than the others, and appears to be the least favorable. For pipeline computers similar to CDC STAR, cyclic odd-even reduction appears to be the most preferable algorithm for all cases
Simulation of Laser Propagation in a Plasma with a Frequency Wave Equation
The aim of this work is to perform numerical simulations of the propagation
of a laser in a plasma. At each time step, one has to solve a Helmholtz
equation in a domain which consists in some hundreds of millions of cells. To
solve this huge linear system, one uses a iterative Krylov method with a
preconditioning by a separable matrix. The corresponding linear system is
solved with a block cyclic reduction method. Some enlightments on the parallel
implementation are also given. Lastly, numerical results are presented
including some features concerning the scalability of the numerical method on a
parallel architecture
Some fast elliptic solvers on parallel architectures and their complexities
The discretization of separable elliptic partial differential equations leads to linear systems with special block triangular matrices. Several methods are known to solve these systems, the most general of which is the Block Cyclic Reduction (BCR) algorithm which handles equations with nonconsistant coefficients. A method was recently proposed to parallelize and vectorize BCR. Here, the mapping of BCR on distributed memory architectures is discussed, and its complexity is compared with that of other approaches, including the Alternating-Direction method. A fast parallel solver is also described, based on an explicit formula for the solution, which has parallel computational complexity lower than that of parallel BCR
- …