1,646 research outputs found

    A weakly stable algorithm for general Toeplitz systems

    Full text link
    We show that a fast algorithm for the QR factorization of a Toeplitz or Hankel matrix A is weakly stable in the sense that R^T.R is close to A^T.A. Thus, when the algorithm is used to solve the semi-normal equations R^T.Rx = A^Tb, we obtain a weakly stable method for the solution of a nonsingular Toeplitz or Hankel linear system Ax = b. The algorithm also applies to the solution of the full-rank Toeplitz or Hankel least squares problem.Comment: 17 pages. An old Technical Report with postscript added. For further details, see http://wwwmaths.anu.edu.au/~brent/pub/pub143.htm

    On recursive least-squares filtering algorithms and implementations

    Get PDF
    In many real-time signal processing applications, fast and numerically stable algorithms for solving least-squares problems are necessary and important. In particular, under non-stationary conditions, these algorithms must be able to adapt themselves to reflect the changes in the system and take appropriate adjustments to achieve optimum performances. Among existing algorithms, the QR-decomposition (QRD)-based recursive least-squares (RLS) methods have been shown to be useful and effective for adaptive signal processing. In order to increase the speed of processing and achieve high throughput rate, many algorithms are being vectorized and/or pipelined to facilitate high degrees of parallelism. A time-recursive formulation of RLS filtering employing block QRD will be considered first. Several methods, including a new non-continuous windowing scheme based on selectively rejecting contaminated data, were investigated for adaptive processing. Based on systolic triarrays, many other forms of systolic arrays are shown to be capable of implementing different algorithms. Various updating and downdating systolic algorithms and architectures for RLS filtering are examined and compared in details, which include Householder reflector, Gram-Schmidt procedure, and Givens rotation. A unified approach encompassing existing square-root-free algorithms is also proposed. For the sinusoidal spectrum estimation problem, a judicious method of separating the noise from the signal is of great interest. Various truncated QR methods are proposed for this purpose and compared to the truncated SVD method. Computer simulations provided for detailed comparisons show the effectiveness of these methods. This thesis deals with fundamental issues of numerical stability, computational efficiency, adaptivity, and VLSI implementation for the RLS filtering problems. In all, various new and modified algorithms and architectures are proposed and analyzed; the significance of any of the new method depends crucially on specific application

    Inverse modelling of image-based patient-specific blood vessels : zero-pressure geometry and in vivo stress incorporation

    Get PDF
    In vivo visualization of cardiovascular structures is possible using medical images. However, one has to realize that the resulting 3D geometries correspond to in vivo conditions. This entails an internal stress state to be present in the in vivo measured geometry of e.g. a blood vessel due to the presence of the blood pressure. In order to correct for this in vivo stress, this paper presents an inverse method to restore the original zero-pressure geometry of a structure, and to recover the in vivo stress field of the final, loaded structure. The proposed backward displacement method is able to solve the inverse problem iteratively using fixed point iterations, but can be significantly accelerated by a quasi-Newton technique in which a least-squares model is used to approximate the inverse of the Jacobian. The here proposed backward displacement method allows for a straightforward implementation of the algorithm in combination with existing structural solvers, even if the structural solver is a black box, as only an update of the coordinates of the mesh needs to be performed

    Using reconfigurable computing technology to accelerate matrix decomposition and applications

    Get PDF
    Matrix decomposition plays an increasingly significant role in many scientific and engineering applications. Among numerous techniques, Singular Value Decomposition (SVD) and Eigenvalue Decomposition (EVD) are widely used as factorization tools to perform Principal Component Analysis for dimensionality reduction and pattern recognition in image processing, text mining and wireless communications, while QR Decomposition (QRD) and sparse LU Decomposition (LUD) are employed to solve the dense or sparse linear system of equations in bioinformatics, power system and computer vision. Matrix decompositions are computationally expensive and their sequential implementations often fail to meet the requirements of many time-sensitive applications. The emergence of reconfigurable computing has provided a flexible and low-cost opportunity to pursue high-performance parallel designs, and the use of FPGAs has shown promise in accelerating this class of computation. In this research, we have proposed and implemented several highly parallel FPGA-based architectures to accelerate matrix decompositions and their applications in data mining and signal processing. Specifically, in this dissertation we describe the following contributions: • We propose an efficient FPGA-based double-precision floating-point architecture for EVD, which can efficiently analyze large-scale matrices. • We implement a floating-point Hestenes-Jacobi architecture for SVD, which is capable of analyzing arbitrary sized matrices. • We introduce a novel deeply pipelined reconfigurable architecture for QRD, which can be dynamically configured to perform either Householder transformation or Givens rotation in a manner that takes advantage of the strengths of each. • We design a configurable architecture for sparse LUD that supports both symmetric and asymmetric sparse matrices with arbitrary sparsity patterns. • By further extending the proposed hardware solution for SVD, we parallelize a popular text mining tool-Latent Semantic Indexing with an FPGA-based architecture. • We present a configurable architecture to accelerate Homotopy l1-minimization, in which the modification of the proposed FPGA architecture for sparse LUD is used at its core to parallelize both Cholesky decomposition and rank-1 update. Our experimental results using an FPGA-based acceleration system indicate the efficiency of our proposed novel architectures, with application and dimension-dependent speedups over an optimized software implementation that range from 1.5ÃÂ to 43.6ÃÂ in terms of computation time

    Training a Linear Neural Network with a Stable LSP Solution for Jamming Cancellation

    Get PDF
    Two jamming cancellation algorithms are developed based on a stable solution of least squares problem (LSP) provided by regularization. They are based on filtered singular value decomposition (SVD) and modifications of the Greville formula. Both algorithms allow an efficient hardware implementation. Testing results on artificial data modeling difficult real-world situations are also provided

    Material parameter identification for modelling the left ventricle in the healthy state

    Get PDF
    Includes bibliographical references.An idealized truncated ellipsoidal model, was used to simulate a healthy canine left ventricle. Passive behaviour of the myocardium was modelled using the constitutive model of Usyk. In addition, active behaviour of the myocardium was modelled by the active stress law of Guccione. Furthermore, the load faced by the left ventricle in ejecting blood into the arterial system, was modelled with the three element Windkessel model of Westerhof. The model was calibrated to pressure-volume data, which was adaptedfrom the work of Kerckhoffs. The projected Levenberg-Marquardt algorithm was used to identify material parameters. Identification of the anisotropic constants in the model of Usyk proved to be difficult, with the calibration algorithm often converging to parameter values that produced numerical instability. An idealized truncated ellipsoidal model, was used to simulate a healthy canine left ventricle. Passive behaviour of the myocardium was modelled using the constitutive model of Usyk. In addition, active behaviour of the myocardium was modelled by theactive stress law of Guccione. Furthermore, the load faced by the left ventricle in ejecting blood into the arterial system, was modelled with the three element Windkessel model of Westerhof. The model was calibrated to pressure-volume data, which was adapted from the work of Kerckhoffs. The projected Levenberg-Marquardt algorithm was used to identify material parameters. Identification of the anisotropic constants in the model of Usyk proved to be difficult, with the calibration algorithm often converging to parameter values that produced numerical instability. An idealized truncated ellipsoidal model, was used to simulate a healthy canine left ventricle. Passive behaviour of the myocardium was modelled using the constitutive model of Usyk. In addition, active behaviour of the myocardium was modelled by the active stress law of Guccione. Furthermore, the load faced by the left ventricle in ejecting blood into the arterial system, was modelled with the three element Windkessel model of Westerhof. The model was calibrated to pressure-volume data, which was adaptedfrom the work of Kerckhoffs. The projected Levenberg-Marquardt algorithm was used to identify material parameters. Identification of the anisotropic constants in the model of Usyk proved to be difficult, with the calibration algorithm often converging to parameter values that produced numerical instability. An idealized truncated ellipsoidal model, was used to simulate a healthy canine left ventricle. Passive behaviour of the myocardium was modelled using the constitutive model of Usyk. In addition, active behaviour of the myocardium was modelled by the active stress law of Guccione. Furthermore, the load faced by the left ventricle in ejecting blood into the arterial system, was modelled with the three element Windkessel model of Westerhof. The model was calibrated to pressure-volume data, which was adapted from the work of Kerckhoffs. The projected Levenberg-Marquardt algorithm was used to identify material parameters. Identification of the anisotropic constants in the model of Usyk proved to be difficult, with the calibration algorithm often converging to parameter values that produced numerical instability

    Highly parallel sparse Cholesky factorization

    Get PDF
    Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms

    Reliable and Efficient Parallel Processing Algorithms and Architectures for Modern Signal Processing

    Get PDF
    Least-squares (LS) estimations and spectral decomposition algorithms constitute the heart of modern signal processing and communication problems. Implementations of recursive LS and spectral decomposition algorithms onto parallel processing architectures such as systolic arrays with efficient fault-tolerant schemes are the major concerns of this dissertation. There are four major results in this dissertation. First, we propose the systolic block Householder transformation with application to the recursive least-squares minimization. It is successfully implemented on a systolic array with a two-level pipelined implementation at the vector level as well as at the word level. Second, a real-time algorithm-based concurrent error detection scheme based on the residual method is proposed for the QRD RLS systolic array. The fault diagnosis, order degraded reconfiguration, and performance analysis are also considered. Third, the dynamic range, stability, error detection capability under finite-precision implementation, order degraded performance, and residual estimation under faulty situations for the QRD RLS systolic array are studied in details. Finally, we propose the use of multi-phase systolic algorithms for spectral decomposition based on the QR algorithm. Two systolic architectures, one based on triangular array and another based on rectangular array, are presented for the multiphase operations with fault-tolerant considerations. Eigenvectors and singular vectors can be easily obtained by using the multi-pase operations. Performance issues are also considered

    Rapid Frequency Estimation

    Get PDF
    Frequency estimation plays an important role in many digital signal processing applications. Many areas have benefited from the discovery of the Fast Fourier Transform (FFT) decades ago and from the relatively recent advances in modern spectral estimation techniques within the last few decades. As processor and programmable logic technologies advance, unconventional methods for rapid frequency estimation in white Gaussian noise should be considered for real time applications. In this thesis, a practical hardware implementation that combines two known frequency estimation techniques is presented, implemented, and characterized. The combined implementation, using the well known FFT and a less well known modern spectral analysis method known as the Direct State Space (DSS) algorithm, is used to demonstrate and promote application of modern spectral methods in various real time applications, including Electronic Counter Measure (ECM) techniques
    • …
    corecore