137 research outputs found

    Efficient approximation of functions of some large matrices by partial fraction expansions

    Full text link
    Some important applicative problems require the evaluation of functions Ψ\Psi of large and sparse and/or \emph{localized} matrices AA. Popular and interesting techniques for computing Ψ(A)\Psi(A) and Ψ(A)v\Psi(A)\mathbf{v}, where v\mathbf{v} is a vector, are based on partial fraction expansions. However, some of these techniques require solving several linear systems whose matrices differ from AA by a complex multiple of the identity matrix II for computing Ψ(A)v\Psi(A)\mathbf{v} or require inverting sequences of matrices with the same characteristics for computing Ψ(A)\Psi(A). Here we study the use and the convergence of a recent technique for generating sequences of incomplete factorizations of matrices in order to face with both these issues. The solution of the sequences of linear systems and approximate matrix inversions above can be computed efficiently provided that A1A^{-1} shows certain decay properties. These strategies have good parallel potentialities. Our claims are confirmed by numerical tests

    A simple parallel prefix algorithm for compact finite-difference schemes

    Get PDF
    A compact scheme is a discretization scheme that is advantageous in obtaining highly accurate solutions. However, the resulting systems from compact schemes are tridiagonal systems that are difficult to solve efficiently on parallel computers. Considering the almost symmetric Toeplitz structure, a parallel algorithm, simple parallel prefix (SPP), is proposed. The SPP algorithm requires less memory than the conventional LU decomposition and is highly efficient on parallel machines. It consists of a prefix communication pattern and AXPY operations. Both the computation and the communication can be truncated without degrading the accuracy when the system is diagonally dominant. A formal accuracy study was conducted to provide a simple truncation formula. Experimental results were measured on a MasPar MP-1 SIMD machine and on a Cray 2 vector machine. Experimental results show that the simple parallel prefix algorithm is a good algorithm for the compact scheme on high-performance computers

    Low-rank updates and a divide-and-conquer method for linear matrix equations

    Get PDF
    Linear matrix equations, such as the Sylvester and Lyapunov equations, play an important role in various applications, including the stability analysis and dimensionality reduction of linear dynamical control systems and the solution of partial differential equations. In this work, we present and analyze a new algorithm, based on tensorized Krylov subspaces, for quickly updating the solution of such a matrix equation when its coefficients undergo low-rank changes. We demonstrate how our algorithm can be utilized to accelerate the Newton method for solving continuous-time algebraic Riccati equations. Our algorithm also forms the basis of a new divide-and-conquer approach for linear matrix equations with coefficients that feature hierarchical low-rank structure, such as HODLR, HSS, and banded matrices. Numerical experiments demonstrate the advantages of divide-and-conquer over existing approaches, in terms of computational time and memory consumption

    MADmap: A Massively Parallel Maximum-Likelihood Cosmic Microwave Background Map-Maker

    Full text link
    MADmap is a software application used to produce maximum-likelihood images of the sky from time-ordered data which include correlated noise, such as those gathered by Cosmic Microwave Background (CMB) experiments. It works efficiently on platforms ranging from small workstations to the most massively parallel supercomputers. Map-making is a critical step in the analysis of all CMB data sets, and the maximum-likelihood approach is the most accurate and widely applicable algorithm; however, it is a computationally challenging task. This challenge will only increase with the next generation of ground-based, balloon-borne and satellite CMB polarization experiments. The faintness of the B-mode signal that these experiments seek to measure requires them to gather enormous data sets. MADmap is already being run on up to O(1011)O(10^{11}) time samples, O(108)O(10^8) pixels and O(104)O(10^4) cores, with ongoing work to scale to the next generation of data sets and supercomputers. We describe MADmap's algorithm based around a preconditioned conjugate gradient solver, fast Fourier transforms and sparse matrix operations. We highlight MADmap's ability to address problems typically encountered in the analysis of realistic CMB data sets and describe its application to simulations of the Planck and EBEX experiments. The massively parallel and distributed implementation is detailed and scaling complexities are given for the resources required. MADmap is capable of analysing the largest data sets now being collected on computing resources currently available, and we argue that, given Moore's Law, MADmap will be capable of reducing the most massive projected data sets

    On the decay of the off-diagonal singular values in cyclic reduction

    Get PDF
    It was recently observed in [10] that the singular values of the off-diagonal blocks of the matrix sequences generated by the Cyclic Reduction algorithm decay exponentially. This property was used to solve, with a higher efficiency, certain quadratic matrix equations encountered in the analysis of queuing models. In this paper, we provide a theoretical bound to the basis of this exponential decay together with a tool for its estimation based on a rational interpolation problem. Numerical experiments show that the bound is often accurate in practice. Applications to solving n × n block tridiagonal block Toeplitz systems with n × n quasiseparable blocks and certain generalized Sylvester equations in O(n 2 log n) arithmetic operations are shown

    A survey on recursive algorithms for unbalanced banded Toeplitz systems: computational issues

    Get PDF
    Several direct recursive algorithms for the solution of band Toeplitz systems are considered. All the methods exploit the displacement rank properties, which allow a large reduction of computational efforts and storage requirements. Some algorithms make use of the Sherman-Morrison- Woodbury formula and result to be particularly suitable for the case of unbalanced bandwidths. The computational costs of the algorithms under consideration are compared both in a theoretical and practical setting. Some stability issues are discussed as well

    Algebraic, Block and Multiplicative Preconditioners based on Fast Tridiagonal Solves on GPUs

    Get PDF
    This thesis contributes to the field of sparse linear algebra, graph applications, and preconditioners for Krylov iterative solvers of sparse linear equation systems, by providing a (block) tridiagonal solver library, a generalized sparse matrix-vector implementation, a linear forest extraction, and a multiplicative preconditioner based on tridiagonal solves. The tridiagonal library, which supports (scaled) partial pivoting, outperforms cuSPARSE's tridiagonal solver by factor five while completely utilizing the available GPU memory bandwidth. For the performance optimized solving of multiple right-hand sides, the explicit factorization of the tridiagonal matrix can be computed. The extraction of a weighted linear forest (union of disjoint paths) from a general graph is used to build algebraic (block) tridiagonal preconditioners and deploys the generalized sparse-matrix vector implementation of this thesis for preconditioner construction. During linear forest extraction, a new parallel bidirectional scan pattern, which can operate on double-linked list structures, identifies the path ID and the position of a vertex. The algebraic preconditioner construction is also used to build more advanced preconditioners, which contain multiple tridiagonal factors, based on generalized ILU factorizations. Additionally, other preconditioners based on tridiagonal factors are presented and evaluated in comparison to ILU and ILU incomplete sparse approximate inverse preconditioners (ILU-ISAI) for the solution of large sparse linear equation systems from the Sparse Matrix Collection. For all presented problems of this thesis, an efficient parallel algorithm and its CUDA implementation for single GPU systems is provided
    corecore