923 research outputs found

    A fast solver for linear systems with displacement structure

    Full text link
    We describe a fast solver for linear systems with reconstructable Cauchy-like structure, which requires O(rn^2) floating point operations and O(rn) memory locations, where n is the size of the matrix and r its displacement rank. The solver is based on the application of the generalized Schur algorithm to a suitable augmented matrix, under some assumptions on the knots of the Cauchy-like matrix. It includes various pivoting strategies, already discussed in the literature, and a new algorithm, which only requires reconstructability. We have developed a software package, written in Matlab and C-MEX, which provides a robust implementation of the above method. Our package also includes solvers for Toeplitz(+Hankel)-like and Vandermonde-like linear systems, as these structures can be reduced to Cauchy-like by fast and stable transforms. Numerical experiments demonstrate the effectiveness of the software.Comment: 27 pages, 6 figure

    Block pivoting implementation of a symmetric Toeplitz solver

    Full text link
    Toeplitz matrices are characterized by a special structure that can be exploited in order to obtain fast linear system solvers. These solvers are difficult to parallelize due to their low computational cost and their closely coupled data operations. We propose to transform the Toeplitz system matrix into a Cauchy-like matrix since the latter can be divided into two independent matrices of half the size of the system matrix and each one of these smaller arising matrices can be factorized efficiently in multicore computers. We use OpenMP and store data in memory by blocks in consecutive positions yielding a simple and efficient algorithm. In addition, by exploiting the fact that diagonal pivoting does not destroy the special structure of Cauchy-like matrices, we introduce a local diagonal pivoting technique which improves the accuracy of the solution and the stability of the algorithm.This work was partially supported by the Spanish Ministerio de Ciencia e Innovacion (Project TIN2008-06570-C04-02 and TEC2009-13741), Vicerrectorado de Investigacion de la Universidad Politecnica de Valencia through PAID-05-10 (ref. 2705), and Generalitat Valenciana through project PROMETEO/2009/2013.Alonso-Jordá, P.; Dolz Zaragozá, MF.; Vidal Maciá, AM. (2014). Block pivoting implementation of a symmetric Toeplitz solver. Journal of Parallel and Distributed Computing. 74(5):2392-2399. https://doi.org/10.1016/j.jpdc.2014.02.003S2392239974

    Fast and accurate con-eigenvalue algorithm for optimal rational approximations

    Full text link
    The need to compute small con-eigenvalues and the associated con-eigenvectors of positive-definite Cauchy matrices naturally arises when constructing rational approximations with a (near) optimally small LL^{\infty} error. Specifically, given a rational function with nn poles in the unit disk, a rational approximation with mnm\ll n poles in the unit disk may be obtained from the mmth con-eigenvector of an n×nn\times n Cauchy matrix, where the associated con-eigenvalue λm>0\lambda_{m}>0 gives the approximation error in the LL^{\infty} norm. Unfortunately, standard algorithms do not accurately compute small con-eigenvalues (and the associated con-eigenvectors) and, in particular, yield few or no correct digits for con-eigenvalues smaller than the machine roundoff. We develop a fast and accurate algorithm for computing con-eigenvalues and con-eigenvectors of positive-definite Cauchy matrices, yielding even the tiniest con-eigenvalues with high relative accuracy. The algorithm computes the mmth con-eigenvalue in O(m2n)\mathcal{O}(m^{2}n) operations and, since the con-eigenvalues of positive-definite Cauchy matrices decay exponentially fast, we obtain (near) optimal rational approximations in O(n(logδ1)2)\mathcal{O}(n(\log\delta^{-1})^{2}) operations, where δ\delta is the approximation error in the LL^{\infty} norm. We derive error bounds demonstrating high relative accuracy of the computed con-eigenvalues and the high accuracy of the unit con-eigenvectors. We also provide examples of using the algorithm to compute (near) optimal rational approximations of functions with singularities and sharp transitions, where approximation errors close to machine precision are obtained. Finally, we present numerical tests on random (complex-valued) Cauchy matrices to show that the algorithm computes all the con-eigenvalues and con-eigenvectors with nearly full precision

    Minimizing Communication in Linear Algebra

    Full text link
    In 1981 Hong and Kung proved a lower bound on the amount of communication needed to perform dense, matrix-multiplication using the conventional O(n3)O(n^3) algorithm, where the input matrices were too large to fit in the small, fast memory. In 2004 Irony, Toledo and Tiskin gave a new proof of this result and extended it to the parallel case. In both cases the lower bound may be expressed as Ω\Omega(#arithmetic operations / M\sqrt{M}), where M is the size of the fast memory (or local memory in the parallel case). Here we generalize these results to a much wider variety of algorithms, including LU factorization, Cholesky factorization, LDLTLDL^T factorization, QR factorization, algorithms for eigenvalues and singular values, i.e., essentially all direct methods of linear algebra. The proof works for dense or sparse matrices, and for sequential or parallel algorithms. In addition to lower bounds on the amount of data moved (bandwidth) we get lower bounds on the number of messages required to move it (latency). We illustrate how to extend our lower bound technique to compositions of linear algebra operations (like computing powers of a matrix), to decide whether it is enough to call a sequence of simpler optimal algorithms (like matrix multiplication) to minimize communication, or if we can do better. We give examples of both. We also show how to extend our lower bounds to certain graph theoretic problems. We point out recently designed algorithms for dense LU, Cholesky, QR, eigenvalue and the SVD problems that attain these lower bounds; implementations of LU and QR show large speedups over conventional linear algebra algorithms in standard libraries like LAPACK and ScaLAPACK. Many open problems remain.Comment: 27 pages, 2 table

    Fast linear algebra is stable

    Full text link
    In an earlier paper, we showed that a large class of fast recursive matrix multiplication algorithms is stable in a normwise sense, and that in fact if multiplication of nn-by-nn matrices can be done by any algorithm in O(nω+η)O(n^{\omega + \eta}) operations for any η>0\eta > 0, then it can be done stably in O(nω+η)O(n^{\omega + \eta}) operations for any η>0\eta > 0. Here we extend this result to show that essentially all standard linear algebra operations, including LU decomposition, QR decomposition, linear equation solving, matrix inversion, solving least squares problems, (generalized) eigenvalue problems and the singular value decomposition can also be done stably (in a normwise sense) in O(nω+η)O(n^{\omega + \eta}) operations.Comment: 26 pages; final version; to appear in Numerische Mathemati

    Row Compression and Nested Product Decomposition of a Hierarchical Representation of a Quasiseparable Matrix

    Get PDF
    This research introduces a row compression and nested product decomposition of an nxn hierarchical representation of a rank structured matrix A, which extends the compression and nested product decomposition of a quasiseparable matrix. The hierarchical parameter extraction algorithm of a quasiseparable matrix is efficient, requiring only O(nlog(n))operations, and is proven backward stable. The row compression is comprised of a sequence of small Householder transformations that are formed from the low-rank, lower triangular, off-diagonal blocks of the hierarchical representation. The row compression forms a factorization of matrix A, where A = QC, Q is the product of the Householder transformations, and C preserves the low-rank structure in both the lower and upper triangular parts of matrix A. The nested product decomposition is accomplished by applying a sequence of orthogonal transformations to the low-rank, upper triangular, off-diagonal blocks of the compressed matrix C. Both the compression and decomposition algorithms are stable, and require O(nlog(n)) operations. At this point, the matrix-vector product and solver algorithms are the only ones fully proven to be backward stable for quasiseparable matrices. By combining the fast matrix-vector product and system solver, linear systems involving the hierarchical representation to nested product decomposition are directly solved with linear complexity and unconditional stability. Applications in image deblurring and compression, that capitalize on the concepts from the row compression and nested product decomposition algorithms, will be shown
    corecore