970 research outputs found
Parallel Factorizations in Numerical Analysis
In this paper we review the parallel solution of sparse linear systems,
usually deriving by the discretization of ODE-IVPs or ODE-BVPs. The approach is
based on the concept of parallel factorization of a (block) tridiagonal matrix.
This allows to obtain efficient parallel extensions of many known matrix
factorizations, and to derive, as a by-product, a unifying approach to the
parallel solution of ODEs.Comment: 15 pages, 5 figure
A Direct Elliptic Solver Based on Hierarchically Low-rank Schur Complements
A parallel fast direct solver for rank-compressible block tridiagonal linear
systems is presented. Algorithmic synergies between Cyclic Reduction and
Hierarchical matrix arithmetic operations result in a solver with arithmetic complexity and memory footprint. We provide a
baseline for performance and applicability by comparing with well known
implementations of the -LU factorization and algebraic multigrid
with a parallel implementation that leverages the concurrency features of the
method. Numerical experiments reveal that this method is comparable with other
fast direct solvers based on Hierarchical Matrices such as -LU and
that it can tackle problems where algebraic multigrid fails to converge
Differential qd algorithm with shifts for rank-structured matrices
Although QR iterations dominate in eigenvalue computations, there are several
important cases when alternative LR-type algorithms may be preferable. In
particular, in the symmetric tridiagonal case where differential qd algorithm
with shifts (dqds) proposed by Fernando and Parlett enjoys often faster
convergence while preserving high relative accuracy (that is not guaranteed in
QR algorithm). In eigenvalue computations for rank-structured matrices QR
algorithm is also a popular choice since, in the symmetric case, the rank
structure is preserved. In the unsymmetric case, however, QR algorithm destroys
the rank structure and, hence, LR-type algorithms come to play once again. In
the current paper we discover several variants of qd algorithms for
quasiseparable matrices. Remarkably, one of them, when applied to Hessenberg
matrices becomes a direct generalization of dqds algorithm for tridiagonal
matrices. Therefore, it can be applied to such important matrices as companion
and confederate, and provides an alternative algorithm for finding roots of a
polynomial represented in the basis of orthogonal polynomials. Results of
preliminary numerical experiments are presented
A new approximate matrix factorization for implicit time integration in air pollution modeling
Implicit time stepping typically requires solution of one or several linear systems with a matrix I−τJ per time step where J is the Jacobian matrix. If solution of these systems is expensive, replacing I−τJ with its approximate matrix factorization (AMF) (I−τR)(I−τV), R+V=J, often leads to a good compromise between stability and accuracy of the time integration on the one hand and its efficiency on the other hand. For example, in air pollution modeling, AMF has been successfully used in the framework of Rosenbrock schemes. The standard AMF gives an approximation to I−τJ with the error τ2RV, which can be significant in norm. In this paper we propose a new AMF. In assumption that −V is an M-matrix, the error of the new AMF can be shown to have an upper bound τ||R||, while still being asymptotically . This new AMF, called AMF+, is equal in costs to standard AMF and, as both analysis and numerical experiments reveal, provides a better accuracy. We also report on our experience with another, cheaper AMF and with AMF-preconditioned GMRES
Improving approximate matrix factorizations for implicit time integration in air pollution modelling
For a long time operator splitting was the only computationally feasible way of implicit time integration in large scale Air Pollution Models. A recently proposed attractive alternative is Rosenbrock schemes combined with Approximate Matrix Factorization (AMF). With AMF, linear systems arising in implicit time stepping are solved approximately in such a way that the overall computational costs per time step are not higher than those of splitting methods. We propose and discuss two new variants of AMF. The first one is aimed at yet a further reduction of costs as compared with conventional AMF. The second variant of AMF provides in certain circumstances a better approximation to the inverse of the linear system matrix than standard AMF and requires the same computational work
Minimizing Communication for Eigenproblems and the Singular Value Decomposition
Algorithms have two costs: arithmetic and communication. The latter
represents the cost of moving data, either between levels of a memory
hierarchy, or between processors over a network. Communication often dominates
arithmetic and represents a rapidly increasing proportion of the total cost, so
we seek algorithms that minimize communication. In \cite{BDHS10} lower bounds
were presented on the amount of communication required for essentially all
-like algorithms for linear algebra, including eigenvalue problems and
the SVD. Conventional algorithms, including those currently implemented in
(Sca)LAPACK, perform asymptotically more communication than these lower bounds
require. In this paper we present parallel and sequential eigenvalue algorithms
(for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms
that do attain these lower bounds, and analyze their convergence and
communication costs.Comment: 43 pages, 11 figure
Parallelization of implicit finite difference schemes in computational fluid dynamics
Implicit finite difference schemes are often the preferred numerical schemes in computational fluid dynamics, requiring less stringent stability bounds than the explicit schemes. Each iteration in an implicit scheme involves global data dependencies in the form of second and higher order recurrences. Efficient parallel implementations of such iterative methods are considerably more difficult and non-intuitive. The parallelization of the implicit schemes that are used for solving the Euler and the thin layer Navier-Stokes equations and that require inversions of large linear systems in the form of block tri-diagonal and/or block penta-diagonal matrices is discussed. Three-dimensional cases are emphasized and schemes that minimize the total execution time are presented. Partitioning and scheduling schemes for alleviating the effects of the global data dependencies are described. An analysis of the communication and the computation aspects of these methods is presented. The effect of the boundary conditions on the parallel schemes is also discussed
LU factorization with panel rank revealing pivoting and its communication avoiding version
We present the LU decomposition with panel rank revealing pivoting (LU_PRRP),
an LU factorization algorithm based on strong rank revealing QR panel
factorization. LU_PRRP is more stable than Gaussian elimination with partial
pivoting (GEPP). Our extensive numerical experiments show that the new
factorization scheme is as numerically stable as GEPP in practice, but it is
more resistant to pathological cases and easily solves the Wilkinson matrix and
the Foster matrix. We also present CALU_PRRP, a communication avoiding version
of LU_PRRP that minimizes communication. CALU_PRRP is based on tournament
pivoting, with the selection of the pivots at each step of the tournament being
performed via strong rank revealing QR factorization. CALU_PRRP is more stable
than CALU, the communication avoiding version of GEPP. CALU_PRRP is also more
stable in practice and is resistant to pathological cases on which GEPP and
CALU fail.Comment: No. RR-7867 (2012
- …