14 research outputs found

    Fast Algorithms for Displacement and Low-Rank Structured Matrices

    Full text link
    This tutorial provides an introduction to the development of fast matrix algorithms based on the notions of displacement and various low-rank structures

    A fast semi-direct least squares algorithm for hierarchically block separable matrices

    Full text link
    We present a fast algorithm for linear least squares problems governed by hierarchically block separable (HBS) matrices. Such matrices are generally dense but data-sparse and can describe many important operators including those derived from asymptotically smooth radial kernels that are not too oscillatory. The algorithm is based on a recursive skeletonization procedure that exposes this sparsity and solves the dense least squares problem as a larger, equality-constrained, sparse one. It relies on a sparse QR factorization coupled with iterative weighted least squares methods. In essence, our scheme consists of a direct component, comprised of matrix compression and factorization, followed by an iterative component to enforce certain equality constraints. At most two iterations are typically required for problems that are not too ill-conditioned. For an M×NM \times N HBS matrix with M≥NM \geq N having bounded off-diagonal block rank, the algorithm has optimal O(M+N)\mathcal{O} (M + N) complexity. If the rank increases with the spatial dimension as is common for operators that are singular at the origin, then this becomes O(M+N)\mathcal{O} (M + N) in 1D, O(M+N3/2)\mathcal{O} (M + N^{3/2}) in 2D, and O(M+N2)\mathcal{O} (M + N^{2}) in 3D. We illustrate the performance of the method on both over- and underdetermined systems in a variety of settings, with an emphasis on radial basis function approximation and efficient updating and downdating.Comment: 24 pages, 8 figures, 6 tables; to appear in SIAM J. Matrix Anal. App

    Computing the Nearest Doubly Stochastic Matrix with A Prescribed Entry

    Get PDF
    In this paper a nearest doubly stochastic matrix problem is studied. This problem is to ¯nd the closest doubly stochastic matrix with the prescribed (1; 1) entry to a given matrix. According to the well-established dual theory in optimization, the dual of the underlying problem is an unconstrained di®erentiable but not twice di®erentiable convex optimization problem. A Newton-type method is used for solving the associated dual problem and then the desired nearest doubly stochastic matrix is obtained. Under some mild assumptions, the quadratic convergence of the proposed Newton's method is proved. The numerical performance of the method is also demonstrated by numerical examples

    Row Compression and Nested Product Decomposition of a Hierarchical Representation of a Quasiseparable Matrix

    Get PDF
    This research introduces a row compression and nested product decomposition of an nxn hierarchical representation of a rank structured matrix A, which extends the compression and nested product decomposition of a quasiseparable matrix. The hierarchical parameter extraction algorithm of a quasiseparable matrix is efficient, requiring only O(nlog(n))operations, and is proven backward stable. The row compression is comprised of a sequence of small Householder transformations that are formed from the low-rank, lower triangular, off-diagonal blocks of the hierarchical representation. The row compression forms a factorization of matrix A, where A = QC, Q is the product of the Householder transformations, and C preserves the low-rank structure in both the lower and upper triangular parts of matrix A. The nested product decomposition is accomplished by applying a sequence of orthogonal transformations to the low-rank, upper triangular, off-diagonal blocks of the compressed matrix C. Both the compression and decomposition algorithms are stable, and require O(nlog(n)) operations. At this point, the matrix-vector product and solver algorithms are the only ones fully proven to be backward stable for quasiseparable matrices. By combining the fast matrix-vector product and system solver, linear systems involving the hierarchical representation to nested product decomposition are directly solved with linear complexity and unconditional stability. Applications in image deblurring and compression, that capitalize on the concepts from the row compression and nested product decomposition algorithms, will be shown

    ERROR CONTROL AND EFFICIENT MEMORY MANAGEMENT FOR SPARSE INTEGRAL EQUATION SOLVERS BASED ON LOCAL-GLOBAL SOLUTION MODES

    Get PDF
    This dissertation presents and analyzes two new algorithms for sparse direct solution methods based on the use of local-global solution (LOGOS) modes. One of the new algorithms is a rigorous error control strategy for LOGOS-based matrix factorizations that utilize overlapped, localizing modes (OL-LOGOS) on a shifted grid. The use of OL-LOGOS modes is critical to obtaining asymptotically efficient factorizations from LOGOS-based methods. Unfortunately, the approach also introduces a non-orthogonal basis function structure. This can cause errors to accumulate across levels of a multilevel implementation, which has previously posed a barrier to rigorous error control for the OL-LOGOS factorization method. This limitation is overcome, and it is shown that it is possible to efficiently decouple the fundamentally non-orthogonal factorization subspaces in a manner that prevents multilevel error propagation. This renders the OL-LOGOS factorization error controllable in a relative RMS sense. The impact of the new, error-controlled OL-LOGOS factorization algorithm on computational resource utilization is discussed and several numerical examples are presented to illustrate the performance of the improved algorithm relative to previously reported results. The second algorithmic development considered is the development of efficient out-of-core (OOC) versions of the OL-LOGOS factorization algorithm that allow associated software tools to take advantage of additional resources for memory management. The proposed OOC algorithm incorporates a memory page definition that is tailored to match the flow of the OL-LOGOS factorization procedure. Efficiency of the function of the part is evaluated using a quantitative approach, because the tested massive storage device performances do not follow analytical results. The performance latency and the memory usage of the resulting OOC tools are compared with in-core performance results. Both the new error control algorithm and the OOC method have been incorporated into previously existing software tools, and the dissertation presents results for real-world simulation problems
    corecore