22 research outputs found
A weakly stable algorithm for general Toeplitz systems
We show that a fast algorithm for the QR factorization of a Toeplitz or
Hankel matrix A is weakly stable in the sense that R^T.R is close to A^T.A.
Thus, when the algorithm is used to solve the semi-normal equations R^T.Rx =
A^Tb, we obtain a weakly stable method for the solution of a nonsingular
Toeplitz or Hankel linear system Ax = b. The algorithm also applies to the
solution of the full-rank Toeplitz or Hankel least squares problem.Comment: 17 pages. An old Technical Report with postscript added. For further
details, see http://wwwmaths.anu.edu.au/~brent/pub/pub143.htm
On recursive least-squares filtering algorithms and implementations
In many real-time signal processing applications, fast and numerically stable algorithms for solving least-squares problems are necessary and important. In particular, under non-stationary conditions, these algorithms must be able to adapt themselves to reflect the changes in the system and take appropriate adjustments to achieve optimum performances. Among existing algorithms, the QR-decomposition (QRD)-based recursive least-squares (RLS) methods have been shown to be useful and effective for adaptive signal processing. In order to increase the speed of processing and achieve high throughput rate, many algorithms are being vectorized and/or pipelined to facilitate high degrees of parallelism. A time-recursive formulation of RLS filtering employing block QRD will be considered first. Several methods, including a new non-continuous windowing scheme based on selectively rejecting contaminated data, were investigated for adaptive processing. Based on systolic triarrays, many other forms of systolic arrays are shown to be capable of implementing different algorithms. Various updating and downdating systolic algorithms and architectures for RLS filtering are examined and compared in details, which include Householder reflector, Gram-Schmidt procedure, and Givens rotation. A unified approach encompassing existing square-root-free algorithms is also proposed. For the sinusoidal spectrum estimation problem, a judicious method of separating the noise from the signal is of great interest. Various truncated QR methods are proposed for this purpose and compared to the truncated SVD method. Computer simulations provided for detailed comparisons show the effectiveness of these methods. This thesis deals with fundamental issues of numerical stability, computational efficiency, adaptivity, and VLSI implementation for the RLS filtering problems. In all, various new and modified algorithms and architectures are proposed and analyzed; the significance of any of the new method depends crucially on specific application
On the stability of the Bareiss and related Toeplitz factorization algorithms
This paper contains a numerical stability analysis of factorization algorithms for computing the Cholesky decomposition of symmetric positive definite matrices of displacement rank 2. The algorithms in the class can be expressed as sequences of elementary downdating steps. The stability of the factorization algorithms follows directly from the numerical properties of algorithms for realizing elementary downdating operations. It is shown that the Bareiss algorithm for factorizing a symmetric positive definite Toeplitz matrix is in the class and hence the Bareiss algorithm is stable. Some numerical experiments that compare behavior of the Bareiss algorithm and the Levinson algorithm are presented. These experiments indicate that in general (when the reection coefficients are not all positive) the Levinson algorithm can give much larger residuals than the Bareiss algorithm
A recursive three-stage least squares method for large-scale systems of simultaneous equations
A new numerical method is proposed that uses the QR decomposition (and its variants) to derive recursively the three-stage least squares (3SLS) estimator of large-scale simultaneous equations models (SEM). The 3SLS estimator is obtained sequentially, once the underlying model is modified, by adding or deleting rows of data. A new theoretical pseudo SEM is developed which has a non positive definite dispersion matrix and is proved to yield the 3SLS estimator that would be derived if the modified SEM was estimated afresh. In addition, the computation of the iterative 3SLS estimator of the updated observations SEM is considered. The new recursive method utilizes efficiently previous computations, exploits sparsity in the pseudo SEM and uses as main computational tool orthogonal and hyperbolic matrix factorizations. This allows the estimation of large-scale SEMs which previously could have been considered computationally infeasible to tackle. Numerical trials have confirmed the effectiveness of the new estimation procedures. The new method is illustrated through a macroeconomic application
Randomized methods for matrix computations
The purpose of this text is to provide an accessible introduction to a set of
recently developed algorithms for factorizing matrices. These new algorithms
attain high practical speed by reducing the dimensionality of intermediate
computations using randomized projections. The algorithms are particularly
powerful for computing low-rank approximations to very large matrices, but they
can also be used to accelerate algorithms for computing full factorizations of
matrices. A key competitive advantage of the algorithms described is that they
require less communication than traditional deterministic methods
Recommended from our members
Building Rank-Revealing Factorizations with Randomization
This thesis describes a set of randomized algorithms for computing rank revealing factorizations of matrices. These algorithms are designed specifically to minimize the amount of data movement required, which is essential to high practical performance on modern computing hardware. The work presented builds on existing randomized algorithms for computing low-rank approximations to matrices, but essentially ex- tends the range of applicability of these methods by allowing for the efficient decomposition of matrices of any numerical rank, including full rank matrices. In contrast, existing methods worked well only when the numerical rank was substantially smaller than the dimensions of the matrix.The thesis describes algorithms for computing two of the most popular rank-revealing matrix decom- positions: the column pivoted QR (CPQR) decomposition, and the so called UTV decomposition that factors a given matrix A as A = UTVâ, where U and V have orthonormal columns and T is triangular. For each algorithm, the thesis presents algorithms that are tailored for different computing environments, including multicore shared memory processors, GPUs, distributed memory machines, and matrices that are stored on hard drives (âout of coreâ).The first chapter of the thesis consists of an introduction that provides context, reviews previous work in the field, and summarizes the key contributions. Beside the introduction, the thesis contains six additional chapters:Chapter 2 introduces a fully blocked algorithm HQRRP for computing a QR factorization with col- umn pivoting. The key to the full blocking of the algorithm lies in using randomized projections to create a low dimensional sketch of the data, where multiple good pivot columns may be cheaply computed. Nu- merical experiments show that HQRRP is several times faster than the classical algorithm for computing a column pivoted QR on a multicore machine, and the acceleration factor increases with the number of cores.Chapter 3 introduces randUTV, a randomized algorithm for computing a rank-revealing factorizationof the form A = UTVâ, where U and V are orthogonal and T is upper triangular. RandUTV uses random- ized methods to efficiently build U and V as approximations of the column and row spaces of A. The result is an algorithm that reveals rank nearly as well as the SVD and costs at most as much as a column pivoted QR.Chapter 4 provides optimized implementations for shared and distributed memory architectures. For shared memory, we show that formulating randUTV as an algorithm-by-blocks increases its efficiency in parallel. The fifth chapter implements randUTV on the GPU and augments the algorithm with an over- sampling technique to further increase the low rank approximation properties of the resulting factorization. Chapter 6 implements both randUTV and HQRRP for use with matrices stored out of core. It is shown that reorganizing HQRRP as a left-looking algorithm to reduce the number of writes to the drive is in the tested cases necessary for the scalability of the algorithm when using spinning disk storage. Finally, chapter 7 discusses an alternative use for randUTV as a nuclear norm estimator and measures the acceleration gained from trimming down the algorithm when only singular value estimates are required
Efficient implementation of a structured total least squares based speech compression method
AbstractWe present a fast implementation of a recently proposed speech compression scheme, based on an all-pole model of the vocal tract. Each frame of the speech signal is analyzed by storing the parameters of the complex damped exponentials deduced from the all-pole model and its initial conditions. In mathematical terms, the analysis stage corresponds to solving a structured total least squares (STLS) problem. It is shown that by exploiting the displacement rank structure of the involved matrices the STLS problem can be solved in a very fast way. Synthesis is computationally very cheap since it consists of adding the complex damped exponentials based on the transmitted parameters.The compression scheme is applied on a speech signal. The speed improvement of the fast vocoder analysis scheme is demonstrated. Furthermore, the quality of the compression scheme is compared with that of a standard coding algorithm, by using the segmental signal-to-noise ratio