Search CORE

137 research outputs found

TR-2013011: Fast Approximation Algorithms for Cauchy Matrices, Polynomials and Rational Functions

Author: Pan Victor Y.
Publication venue: CUNY Academic Works
Publication date: 01/01/2013
Field of study

City University of New York

Efficient approximation of functions of some large matrices by partial fraction expansions

Author: Bertaccini Daniele
Durastante Fabio
Popolizio Marina
Publication venue: 'Informa UK Limited'
Publication date: 12/04/2018
Field of study

Some important applicative problems require the evaluation of functions

\Psi

of large and sparse and/or \emph{localized} matrices

A

. Popular and interesting techniques for computing

\Psi(A)

and

\Psi(A)\mathbf{v}

, where

\mathbf{v}

is a vector, are based on partial fraction expansions. However, some of these techniques require solving several linear systems whose matrices differ from

A

by a complex multiple of the identity matrix

I

for computing

\Psi(A)\mathbf{v}

or require inverting sequences of matrices with the same characteristics for computing

\Psi(A)

. Here we study the use and the convergence of a recent technique for generating sequences of incomplete factorizations of matrices in order to face with both these issues. The solution of the sequences of linear systems and approximate matrix inversions above can be computed efficiently provided that

A^{-1}

shows certain decay properties. These strategies have good parallel potentialities. Our claims are confirmed by numerical tests

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

ART

A simple parallel prefix algorithm for compact finite-difference schemes

Author: Joslin Ronald D.
Sun Xian-He
Publication venue
Publication date
Field of study

A compact scheme is a discretization scheme that is advantageous in obtaining highly accurate solutions. However, the resulting systems from compact schemes are tridiagonal systems that are difficult to solve efficiently on parallel computers. Considering the almost symmetric Toeplitz structure, a parallel algorithm, simple parallel prefix (SPP), is proposed. The SPP algorithm requires less memory than the conventional LU decomposition and is highly efficient on parallel machines. It consists of a prefix communication pattern and AXPY operations. Both the computation and the communication can be truncated without degrading the accuracy when the system is diagonally dominant. A formal accuracy study was conducted to provide a simple truncation formula. Experimental results were measured on a MasPar MP-1 SIMD machine and on a Cray 2 vector machine. Experimental results show that the simple parallel prefix algorithm is a good algorithm for the compact scheme on high-performance computers

NASA Technical Reports Server

Low-rank updates and a divide-and-conquer method for linear matrix equations

Author: Kressner Daniel
Massei Stefano
Robol Leonardo
Publication venue
Publication date: 01/01/2019
Field of study

Linear matrix equations, such as the Sylvester and Lyapunov equations, play an important role in various applications, including the stability analysis and dimensionality reduction of linear dynamical control systems and the solution of partial differential equations. In this work, we present and analyze a new algorithm, based on tensorized Krylov subspaces, for quickly updating the solution of such a matrix equation when its coefficients undergo low-rank changes. We demonstrate how our algorithm can be utilized to accelerate the Newton method for solving continuous-time algebraic Riccati equations. Our algorithm also forms the basis of a new divide-and-conquer approach for linear matrix equations with coefficients that feature hierarchical low-rank structure, such as HODLR, HSS, and banded matrices. Numerical experiments demonstrate the advantages of divide-and-conquer over existing approaches, in terms of computational time and memory consumption

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Archivio della Ricerca - Università di Pisa

MADmap: A Massively Parallel Maximum-Likelihood Cosmic Microwave Background Map-Maker

Author: A. H. Jaffe
Armitage-Caplan
Barrett
Bock
Borrill
C. M. Cantalupo
Dodelson
Golub
Górski
Hanany
Hinshaw
J. D. Borrill
Janssen
Jewell
Johnson
Kuo
Kurki-Suonio
Oh
Patanchon
Press
R. Stompor
T. S. Kisner
The Planck Collaboration
Wright
Publication venue: 'IOP Publishing'
Publication date: 22/12/2009
Field of study

MADmap is a software application used to produce maximum-likelihood images of the sky from time-ordered data which include correlated noise, such as those gathered by Cosmic Microwave Background (CMB) experiments. It works efficiently on platforms ranging from small workstations to the most massively parallel supercomputers. Map-making is a critical step in the analysis of all CMB data sets, and the maximum-likelihood approach is the most accurate and widely applicable algorithm; however, it is a computationally challenging task. This challenge will only increase with the next generation of ground-based, balloon-borne and satellite CMB polarization experiments. The faintness of the B-mode signal that these experiments seek to measure requires them to gather enormous data sets. MADmap is already being run on up to

O(10^{11})

time samples,

O(10^8)

pixels and

O(10^4)

cores, with ongoing work to scale to the next generation of data sets and supercomputers. We describe MADmap's algorithm based around a preconditioned conjugate gradient solver, fast Fourier transforms and sparse matrix operations. We highlight MADmap's ability to address problems typically encountered in the analysis of realistic CMB data sets and describe its application to simulations of the Planck and EBEX experiments. The massively parallel and distributed implementation is detailed and scaling complexities are given for the resources required. MADmap is capable of analysing the largest data sets now being collected on computing resources currently available, and we argue that, given Moore's Law, MADmap will be capable of reducing the most massive projected data sets

arXiv.org e-Print Archive

On the decay of the off-diagonal singular values in cyclic reduction

Author: BINI DARIO ANDREA
MASSEI STEFANO
ROBOL LEONARDO
Publication venue: 'Elsevier BV'
Publication date: 24/08/2016
Field of study

It was recently observed in [10] that the singular values of the off-diagonal blocks of the matrix sequences generated by the Cyclic Reduction algorithm decay exponentially. This property was used to solve, with a higher efficiency, certain quadratic matrix equations encountered in the analysis of queuing models. In this paper, we provide a theoretical bound to the basis of this exponential decay together with a tool for its estimation based on a rational interpolation problem. Numerical experiments show that the bound is often accurate in practice. Applications to solving n × n block tridiagonal block Toeplitz systems with n × n quasiseparable blocks and certain generalized Sylvester equations in O(n 2 log n) arithmetic operations are shown

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

A survey on recursive algorithms for unbalanced banded Toeplitz systems: computational issues

Author: Favati Paola
Lotti Grazia
Menchi Ornella
Publication venue
Publication date
Field of study

Several direct recursive algorithms for the solution of band Toeplitz systems are considered. All the methods exploit the displacement rank properties, which allow a large reduction of computational efforts and storage requirements. Some algorithms make use of the Sherman-Morrison- Woodbury formula and result to be particularly suitable for the case of unbalanced bandwidths. The computational costs of the algorithms under consideration are compared both in a theoretical and practical setting. Some stability issues are discussed as well

PUblication MAnagement

Algebraic, Block and Multiplicative Preconditioners based on Fast Tridiagonal Solves on GPUs

Author: Klein Christoph Julian
Publication venue
Publication date: 01/01/2023
Field of study

This thesis contributes to the field of sparse linear algebra, graph applications, and preconditioners for Krylov iterative solvers of sparse linear equation systems, by providing a (block) tridiagonal solver library, a generalized sparse matrix-vector implementation, a linear forest extraction, and a multiplicative preconditioner based on tridiagonal solves. The tridiagonal library, which supports (scaled) partial pivoting, outperforms cuSPARSE's tridiagonal solver by factor five while completely utilizing the available GPU memory bandwidth. For the performance optimized solving of multiple right-hand sides, the explicit factorization of the tridiagonal matrix can be computed. The extraction of a weighted linear forest (union of disjoint paths) from a general graph is used to build algebraic (block) tridiagonal preconditioners and deploys the generalized sparse-matrix vector implementation of this thesis for preconditioner construction. During linear forest extraction, a new parallel bidirectional scan pattern, which can operate on double-linked list structures, identifies the path ID and the position of a vertex. The algebraic preconditioner construction is also used to build more advanced preconditioners, which contain multiple tridiagonal factors, based on generalized ILU factorizations. Additionally, other preconditioners based on tridiagonal factors are presented and evaluated in comparison to ILU and ILU incomplete sparse approximate inverse preconditioners (ILU-ISAI) for the solution of large sparse linear equation systems from the Sparse Matrix Collection. For all presented problems of this thesis, an efficient parallel algorithm and its CUDA implementation for single GPU systems is provided

Heidelberger Dokumentenserver