137 research outputs found
Efficient approximation of functions of some large matrices by partial fraction expansions
Some important applicative problems require the evaluation of functions
of large and sparse and/or \emph{localized} matrices . Popular and
interesting techniques for computing and , where
is a vector, are based on partial fraction expansions. However,
some of these techniques require solving several linear systems whose matrices
differ from by a complex multiple of the identity matrix for computing
or require inverting sequences of matrices with the same
characteristics for computing . Here we study the use and the
convergence of a recent technique for generating sequences of incomplete
factorizations of matrices in order to face with both these issues. The
solution of the sequences of linear systems and approximate matrix inversions
above can be computed efficiently provided that shows certain decay
properties. These strategies have good parallel potentialities. Our claims are
confirmed by numerical tests
A simple parallel prefix algorithm for compact finite-difference schemes
A compact scheme is a discretization scheme that is advantageous in obtaining highly accurate solutions. However, the resulting systems from compact schemes are tridiagonal systems that are difficult to solve efficiently on parallel computers. Considering the almost symmetric Toeplitz structure, a parallel algorithm, simple parallel prefix (SPP), is proposed. The SPP algorithm requires less memory than the conventional LU decomposition and is highly efficient on parallel machines. It consists of a prefix communication pattern and AXPY operations. Both the computation and the communication can be truncated without degrading the accuracy when the system is diagonally dominant. A formal accuracy study was conducted to provide a simple truncation formula. Experimental results were measured on a MasPar MP-1 SIMD machine and on a Cray 2 vector machine. Experimental results show that the simple parallel prefix algorithm is a good algorithm for the compact scheme on high-performance computers
Low-rank updates and a divide-and-conquer method for linear matrix equations
Linear matrix equations, such as the Sylvester and Lyapunov equations, play
an important role in various applications, including the stability analysis and
dimensionality reduction of linear dynamical control systems and the solution
of partial differential equations. In this work, we present and analyze a new
algorithm, based on tensorized Krylov subspaces, for quickly updating the
solution of such a matrix equation when its coefficients undergo low-rank
changes. We demonstrate how our algorithm can be utilized to accelerate the
Newton method for solving continuous-time algebraic Riccati equations. Our
algorithm also forms the basis of a new divide-and-conquer approach for linear
matrix equations with coefficients that feature hierarchical low-rank
structure, such as HODLR, HSS, and banded matrices. Numerical experiments
demonstrate the advantages of divide-and-conquer over existing approaches, in
terms of computational time and memory consumption
MADmap: A Massively Parallel Maximum-Likelihood Cosmic Microwave Background Map-Maker
MADmap is a software application used to produce maximum-likelihood images of
the sky from time-ordered data which include correlated noise, such as those
gathered by Cosmic Microwave Background (CMB) experiments. It works efficiently
on platforms ranging from small workstations to the most massively parallel
supercomputers. Map-making is a critical step in the analysis of all CMB data
sets, and the maximum-likelihood approach is the most accurate and widely
applicable algorithm; however, it is a computationally challenging task. This
challenge will only increase with the next generation of ground-based,
balloon-borne and satellite CMB polarization experiments. The faintness of the
B-mode signal that these experiments seek to measure requires them to gather
enormous data sets. MADmap is already being run on up to time
samples, pixels and cores, with ongoing work to scale to
the next generation of data sets and supercomputers. We describe MADmap's
algorithm based around a preconditioned conjugate gradient solver, fast Fourier
transforms and sparse matrix operations. We highlight MADmap's ability to
address problems typically encountered in the analysis of realistic CMB data
sets and describe its application to simulations of the Planck and EBEX
experiments. The massively parallel and distributed implementation is detailed
and scaling complexities are given for the resources required. MADmap is
capable of analysing the largest data sets now being collected on computing
resources currently available, and we argue that, given Moore's Law, MADmap
will be capable of reducing the most massive projected data sets
On the decay of the off-diagonal singular values in cyclic reduction
It was recently observed in [10] that the singular values of the off-diagonal blocks of the matrix sequences generated by the Cyclic Reduction algorithm decay exponentially. This property was used to solve, with a higher efficiency, certain
quadratic matrix equations encountered in the analysis of queuing models. In this paper, we provide a theoretical bound to the basis of this exponential decay together with a tool for its estimation based on a rational interpolation problem. Numerical experiments show that the bound is often accurate in practice. Applications to solving n × n block tridiagonal block Toeplitz systems with n × n quasiseparable blocks and certain generalized Sylvester equations in O(n 2 log n) arithmetic operations are shown
A survey on recursive algorithms for unbalanced banded Toeplitz systems: computational issues
Several direct recursive algorithms for the solution of band Toeplitz systems are considered. All the methods exploit the displacement rank properties, which allow a large reduction of computational efforts and storage requirements. Some algorithms make use of the Sherman-Morrison- Woodbury formula and result to be particularly suitable for the case of unbalanced bandwidths. The computational costs of the algorithms under consideration are compared both in a theoretical and practical setting. Some stability issues are discussed as well
Algebraic, Block and Multiplicative Preconditioners based on Fast Tridiagonal Solves on GPUs
This thesis contributes to the field of sparse linear algebra, graph applications, and preconditioners for Krylov iterative solvers of sparse linear equation systems, by providing a (block) tridiagonal solver library, a generalized sparse matrix-vector implementation, a linear forest extraction, and a multiplicative preconditioner based on tridiagonal solves. The tridiagonal library, which supports (scaled) partial pivoting, outperforms cuSPARSE's tridiagonal solver by factor five while completely utilizing the available GPU memory bandwidth. For the performance optimized solving of multiple right-hand sides, the explicit factorization of the tridiagonal matrix can be computed. The extraction of a weighted linear forest (union of disjoint paths) from a general graph is used to build algebraic (block) tridiagonal preconditioners and deploys the generalized sparse-matrix vector implementation of this thesis for preconditioner construction. During linear forest extraction, a new parallel bidirectional scan pattern, which can operate on double-linked list structures, identifies the path ID and the position of a vertex. The algebraic preconditioner construction is also used to build more advanced preconditioners, which contain multiple tridiagonal factors, based on generalized ILU factorizations. Additionally, other preconditioners based on tridiagonal factors are presented and evaluated in comparison to ILU and ILU incomplete sparse approximate inverse preconditioners (ILU-ISAI) for the solution of large sparse linear equation systems from the Sparse Matrix Collection. For all presented problems of this thesis, an efficient parallel algorithm and its CUDA implementation for single GPU systems is provided
- …