27 research outputs found
Householder QR Factorization With Randomization for Column Pivoting (HQRRP)
A fundamental problem when adding column pivoting to the Householder QR fac-
torization is that only about half of the computation can be cast in terms of high performing matrix-
matrix multiplications, which greatly limits the bene ts that can be derived from so-called blocking
of algorithms. This paper describes a technique for selecting groups of pivot vectors by means of
randomized projections. It is demonstrated that the asymptotic
op count for the proposed method
is 2mn2 �����(2=3)n3 for an m n matrix, identical to that of the best classical unblocked Householder
QR factorization algorithm (with or without pivoting). Experiments demonstrate acceleration in
speed of close to an order of magnitude relative to the geqp3 function in LAPACK, when executed
on a modern CPU with multiple cores. Further, experiments demonstrate that the quality of the
randomized pivot selection strategy is roughly the same as that of classical column pivoting. The
described algorithm is made available under Open Source license and can be used with LAPACK or
libflame
On recursive least-squares filtering algorithms and implementations
In many real-time signal processing applications, fast and numerically stable algorithms for solving least-squares problems are necessary and important. In particular, under non-stationary conditions, these algorithms must be able to adapt themselves to reflect the changes in the system and take appropriate adjustments to achieve optimum performances. Among existing algorithms, the QR-decomposition (QRD)-based recursive least-squares (RLS) methods have been shown to be useful and effective for adaptive signal processing. In order to increase the speed of processing and achieve high throughput rate, many algorithms are being vectorized and/or pipelined to facilitate high degrees of parallelism. A time-recursive formulation of RLS filtering employing block QRD will be considered first. Several methods, including a new non-continuous windowing scheme based on selectively rejecting contaminated data, were investigated for adaptive processing. Based on systolic triarrays, many other forms of systolic arrays are shown to be capable of implementing different algorithms. Various updating and downdating systolic algorithms and architectures for RLS filtering are examined and compared in details, which include Householder reflector, Gram-Schmidt procedure, and Givens rotation. A unified approach encompassing existing square-root-free algorithms is also proposed. For the sinusoidal spectrum estimation problem, a judicious method of separating the noise from the signal is of great interest. Various truncated QR methods are proposed for this purpose and compared to the truncated SVD method. Computer simulations provided for detailed comparisons show the effectiveness of these methods. This thesis deals with fundamental issues of numerical stability, computational efficiency, adaptivity, and VLSI implementation for the RLS filtering problems. In all, various new and modified algorithms and architectures are proposed and analyzed; the significance of any of the new method depends crucially on specific application
A fast semi-direct least squares algorithm for hierarchically block separable matrices
We present a fast algorithm for linear least squares problems governed by
hierarchically block separable (HBS) matrices. Such matrices are generally
dense but data-sparse and can describe many important operators including those
derived from asymptotically smooth radial kernels that are not too oscillatory.
The algorithm is based on a recursive skeletonization procedure that exposes
this sparsity and solves the dense least squares problem as a larger,
equality-constrained, sparse one. It relies on a sparse QR factorization
coupled with iterative weighted least squares methods. In essence, our scheme
consists of a direct component, comprised of matrix compression and
factorization, followed by an iterative component to enforce certain equality
constraints. At most two iterations are typically required for problems that
are not too ill-conditioned. For an HBS matrix with
having bounded off-diagonal block rank, the algorithm has optimal complexity. If the rank increases with the spatial dimension as is
common for operators that are singular at the origin, then this becomes
in 1D, in 2D, and
in 3D. We illustrate the performance of the method on
both over- and underdetermined systems in a variety of settings, with an
emphasis on radial basis function approximation and efficient updating and
downdating.Comment: 24 pages, 8 figures, 6 tables; to appear in SIAM J. Matrix Anal. App
UTV Tools:Matlab Templates for Rank-Revealing UTV Decompositions
published in Numerical Algorithms and the paper's text is reprinted here by kind permissio
randUTV: A blocked randomized algorithm for computing a rank-revealing UTV factorization
This manuscript describes the randomized algorithm randUTV for computing a so
called UTV factorization efficiently. Given a matrix , the algorithm
computes a factorization , where and have orthonormal
columns, and is triangular (either upper or lower, whichever is preferred).
The algorithm randUTV is developed primarily to be a fast and easily
parallelized alternative to algorithms for computing the Singular Value
Decomposition (SVD). randUTV provides accuracy very close to that of the SVD
for problems such as low-rank approximation, solving ill-conditioned linear
systems, determining bases for various subspaces associated with the matrix,
etc. Moreover, randUTV produces highly accurate approximations to the singular
values of . Unlike the SVD, the randomized algorithm proposed builds a UTV
factorization in an incremental, single-stage, and non-iterative way, making it
possible to halt the factorization process once a specified tolerance has been
met. Numerical experiments comparing the accuracy and speed of randUTV to the
SVD are presented. These experiments demonstrate that in comparison to column
pivoted QR, which is another factorization that is often used as a relatively
economic alternative to the SVD, randUTV compares favorably in terms of speed
while providing far higher accuracy
UTV Tools:Matlab Templates for Rank-Revealing UTV Decompositions
We describe a Matlab 5.2 package for computing and modifying certain rank-revealing decompositions that have found widespread use in signal processing and other applications. The package focuses on algorithms for URV and ULV decompositions, collectively known as UTV decompositions. We include algorithms for the ULLV decomposition, which generalizes the ULV decomposition to a pair of matrices. For completeness a few algorithms for computation of the RRQR decomposition are also included. The software in this package can be used as is, or can be considered as templates for specialized implementations on signal processors and similar dedicated hardware platforms
Recommended from our members
Building Rank-Revealing Factorizations with Randomization
This thesis describes a set of randomized algorithms for computing rank revealing factorizations of matrices. These algorithms are designed specifically to minimize the amount of data movement required, which is essential to high practical performance on modern computing hardware. The work presented builds on existing randomized algorithms for computing low-rank approximations to matrices, but essentially ex- tends the range of applicability of these methods by allowing for the efficient decomposition of matrices of any numerical rank, including full rank matrices. In contrast, existing methods worked well only when the numerical rank was substantially smaller than the dimensions of the matrix.The thesis describes algorithms for computing two of the most popular rank-revealing matrix decom- positions: the column pivoted QR (CPQR) decomposition, and the so called UTV decomposition that factors a given matrix A as A = UTVâ, where U and V have orthonormal columns and T is triangular. For each algorithm, the thesis presents algorithms that are tailored for different computing environments, including multicore shared memory processors, GPUs, distributed memory machines, and matrices that are stored on hard drives (âout of coreâ).The first chapter of the thesis consists of an introduction that provides context, reviews previous work in the field, and summarizes the key contributions. Beside the introduction, the thesis contains six additional chapters:Chapter 2 introduces a fully blocked algorithm HQRRP for computing a QR factorization with col- umn pivoting. The key to the full blocking of the algorithm lies in using randomized projections to create a low dimensional sketch of the data, where multiple good pivot columns may be cheaply computed. Nu- merical experiments show that HQRRP is several times faster than the classical algorithm for computing a column pivoted QR on a multicore machine, and the acceleration factor increases with the number of cores.Chapter 3 introduces randUTV, a randomized algorithm for computing a rank-revealing factorizationof the form A = UTVâ, where U and V are orthogonal and T is upper triangular. RandUTV uses random- ized methods to efficiently build U and V as approximations of the column and row spaces of A. The result is an algorithm that reveals rank nearly as well as the SVD and costs at most as much as a column pivoted QR.Chapter 4 provides optimized implementations for shared and distributed memory architectures. For shared memory, we show that formulating randUTV as an algorithm-by-blocks increases its efficiency in parallel. The fifth chapter implements randUTV on the GPU and augments the algorithm with an over- sampling technique to further increase the low rank approximation properties of the resulting factorization. Chapter 6 implements both randUTV and HQRRP for use with matrices stored out of core. It is shown that reorganizing HQRRP as a left-looking algorithm to reduce the number of writes to the drive is in the tested cases necessary for the scalability of the algorithm when using spinning disk storage. Finally, chapter 7 discusses an alternative use for randUTV as a nuclear norm estimator and measures the acceleration gained from trimming down the algorithm when only singular value estimates are required
randUTV: A Blocked Randomized Algorithm for Computing a Rank-Revealing UTV Factorization
A randomized algorithm for computing a so-called UTV factorization efficiently is presented. Given a matrix , the algorithm ârandUTVâ computes a factorization , where and have orthonormal columns, and is triangular (either upper or lower, whichever is preferred). The algorithm randUTV is developed primarily to be a fast and easily parallelized alternative to algorithms for computing the Singular Value Decomposition (SVD). randUTV provides accuracy very close to that of the SVD for problems such as low-rank approximation, solving ill-conditioned linear systems, and determining bases for various subspaces associated with the matrix. Moreover, randUTV produces highly accurate approximations to the singular values of . Unlike the SVD, the randomized algorithm proposed builds a UTV factorization in an incremental, single-stage, and noniterative way, making it possible to halt the factorization process once a specified tolerance has been met. Numerical experiments comparing the accuracy and speed of randUTV to the SVD are presented. Other experiments also demonstrate that in comparison to column-pivoted QR, which is another factorization that is often used as a relatively economic alternative to the SVD, randUTV compares favorably in terms of speed while providing far higher accuracy