55 research outputs found

    Reliable and Efficient Parallel Processing Algorithms and Architectures for Modern Signal Processing

    Get PDF
    Least-squares (LS) estimations and spectral decomposition algorithms constitute the heart of modern signal processing and communication problems. Implementations of recursive LS and spectral decomposition algorithms onto parallel processing architectures such as systolic arrays with efficient fault-tolerant schemes are the major concerns of this dissertation. There are four major results in this dissertation. First, we propose the systolic block Householder transformation with application to the recursive least-squares minimization. It is successfully implemented on a systolic array with a two-level pipelined implementation at the vector level as well as at the word level. Second, a real-time algorithm-based concurrent error detection scheme based on the residual method is proposed for the QRD RLS systolic array. The fault diagnosis, order degraded reconfiguration, and performance analysis are also considered. Third, the dynamic range, stability, error detection capability under finite-precision implementation, order degraded performance, and residual estimation under faulty situations for the QRD RLS systolic array are studied in details. Finally, we propose the use of multi-phase systolic algorithms for spectral decomposition based on the QR algorithm. Two systolic architectures, one based on triangular array and another based on rectangular array, are presented for the multiphase operations with fault-tolerant considerations. Eigenvectors and singular vectors can be easily obtained by using the multi-pase operations. Performance issues are also considered

    On recursive least-squares filtering algorithms and implementations

    Get PDF
    In many real-time signal processing applications, fast and numerically stable algorithms for solving least-squares problems are necessary and important. In particular, under non-stationary conditions, these algorithms must be able to adapt themselves to reflect the changes in the system and take appropriate adjustments to achieve optimum performances. Among existing algorithms, the QR-decomposition (QRD)-based recursive least-squares (RLS) methods have been shown to be useful and effective for adaptive signal processing. In order to increase the speed of processing and achieve high throughput rate, many algorithms are being vectorized and/or pipelined to facilitate high degrees of parallelism. A time-recursive formulation of RLS filtering employing block QRD will be considered first. Several methods, including a new non-continuous windowing scheme based on selectively rejecting contaminated data, were investigated for adaptive processing. Based on systolic triarrays, many other forms of systolic arrays are shown to be capable of implementing different algorithms. Various updating and downdating systolic algorithms and architectures for RLS filtering are examined and compared in details, which include Householder reflector, Gram-Schmidt procedure, and Givens rotation. A unified approach encompassing existing square-root-free algorithms is also proposed. For the sinusoidal spectrum estimation problem, a judicious method of separating the noise from the signal is of great interest. Various truncated QR methods are proposed for this purpose and compared to the truncated SVD method. Computer simulations provided for detailed comparisons show the effectiveness of these methods. This thesis deals with fundamental issues of numerical stability, computational efficiency, adaptivity, and VLSI implementation for the RLS filtering problems. In all, various new and modified algorithms and architectures are proposed and analyzed; the significance of any of the new method depends crucially on specific application

    A new family of approximate QR-LS algorithms for adaptive filtering

    Get PDF
    The 13th IEEE / S P Workshop on Statistical Signal Processing, Bordeaux, France, 17-20 July 2005This paper proposes a new family of approximate QR-based least squares (LS) adaptive filtering algorithms called p-TA-QR-LS algorithms. It extends the TA-QR-LS algorithm [6] by retaining different number of diagonal plus off-diagonals (denoted by an integer p) of the triangular factor of the augmented data matrix. For p=1 and N, it reduces respectively to the TA-QR-LS and the QR-RLS algorithms. It not only provides a link between the QR-LMS-type and the QR-RLS algorithms through a well-structured family of algorithms, but also offers flexible complexity-performance tradeoffs in practical implementation. These results are verified by computer simulation and the mean convergence of the algorithms is also analyzed. © 2005 IEEE.published_or_final_versio

    FPGA-based Low Latency Inverse QRD Architecture for Adaptive Beamforming in Phased Array Radars

    Get PDF
    The main objective of this paper is to facilitate the adaptive beamforming which is one of the most challenging task in phased array radars receivers. Recursive least square (RLS) is considered as the most well suited adaptive algorithm for the applications where beamforming is mandatory, because of its good numerical properties and convergence rate. In this paper, some RLS variants are discussed and the most numerically suitable algorithm Inverse QRD is selected for efficient adaptive beamforming. A novel architecture for IQRD RLS is also presented, which offers low latency and low area occupation for Field Programmable Gate Array (FPGA) implementation. This approach reduces the computations by utilizing the standard pipelining methodology. Hence, efficient adder and multipliers and LUT based solution for square root and division, has highly enhanced the performance of the algorithm. The proposed IQRD RLS architecture has been coded in Verilog and analyze its performance in terms of throughput, hardware resources and efficiency

    Low-Latency RLS Architecture for FPGA Implementation With High Throughput Adaptive Applications

    Get PDF
    A novel architecture for QR-decomposition-based (QRD) recursive least squares (RLS) is presented. It offers low idleness for applications where the channel balance and versatile separating is obligatory. This approach lessens the calculations by reworking the conditions in a way that lets extraordinary equipment asset sharing by reusing comparable qualities in various calculations. Additionally, accuracy run change (PRC) takes into consideration joining complex activities, for example, root square and division with least impact on the general quantization blunder. Hence, an efficient Look Up Table based solution has highly enhanced the performance of the design by 2.7 times with respect to the previous work

    Application-specific instruction set processor for SoC implementation of modern signal processing algorithms

    Full text link

    Implementation of a conjugate matched filter adaptive receiver for DS-CDMA

    Get PDF

    Completely Recursive Least Squares and Its Applications

    Get PDF
    The matrix-inversion-lemma based recursive least squares (RLS) approach is of a recursive form and free of matrix inversion, and has excellent performance regarding computation and memory in solving the classic least-squares (LS) problem. It is important to generalize RLS for generalized LS (GLS) problem. It is also of value to develop an efficient initialization for any RLS algorithm. In Chapter 2, we develop a unified RLS procedure to solve the unconstrained/linear-equality (LE) constrained GLS. We also show that the LE constraint is in essence a set of special error-free observations and further consider the GLS with implicit LE constraint in observations (ILE-constrained GLS). Chapter 3 treats the RLS initialization-related issues, including rank check, a convenient method to compute the involved matrix inverse/pseudoinverse, and resolution of underdetermined systems. Based on auxiliary-observations, the RLS recursion can start from the first real observation and possible LE constraints are also imposed recursively. The rank of the system is checked implicitly. If the rank is deficient, a set of refined non-redundant observations is determined alternatively. In Chapter 4, base on [Li07], we show that the linear minimum mean square error (LMMSE) estimator, as well as the optimal Kalman filter (KF) considering various correlations, can be calculated from solving an equivalent GLS using the unified RLS. In Chapters 5 & 6, an approach of joint state-and-parameter estimation (JSPE) in power system monitored by synchrophasors is adopted, where the original nonlinear parameter problem is reformulated as two loosely-coupled linear subproblems: state tracking and parameter tracking. Chapter 5 deals with the state tracking which determines the voltages in JSPE, where dynamic behavior of voltages under possible abrupt changes is studied. Chapter 6 focuses on the subproblem of parameter tracking in JSPE, where a new prediction model for parameters with moving means is introduced. Adaptive filters are developed for the above two subproblems, respectively, and both filters are based on the optimal KF accounting for various correlations. Simulations indicate that the proposed approach yields accurate parameter estimates and improves the accuracy of the state estimation, compared with existing methods

    REAL-TIME ADAPTIVE PULSE COMPRESSION ON RECONFIGURABLE, SYSTEM-ON-CHIP (SOC) PLATFORMS

    Get PDF
    New radar applications need to perform complex algorithms and process a large quantity of data to generate useful information for the users. This situation has motivated the search for better processing solutions that include low-power high-performance processors, efficient algorithms, and high-speed interfaces. In this work, hardware implementation of adaptive pulse compression algorithms for real-time transceiver optimization is presented, and is based on a System-on-Chip architecture for reconfigurable hardware devices. This study also evaluates the performance of dedicated coprocessors as hardware accelerator units to speed up and improve the computation of computing-intensive tasks such matrix multiplication and matrix inversion, which are essential units to solve the covariance matrix. The tradeoffs between latency and hardware utilization are also presented. Moreover, the system architecture takes advantage of the embedded processor, which is interconnected with the logic resources through high-performance buses, to perform floating-point operations, control the processing blocks, and communicate with an external PC through a customized software interface. The overall system functionality is demonstrated and tested for real-time operations using a Ku-band testbed together with a low-cost channel emulator for different types of waveforms

    Adaptive Beamforming Using the Recursive Least Squares Algorithm on an FPGA

    Get PDF
    This thesis describes the design and implementation of a five-channel beamformer using a Space-Time Adaptive Processing (STAP) filter with Recursive Least Squares (RLS) as the adaptive algorithm. The objective of the algorithm is to compute of a set of filter weights for a STAP filter, such that the channels are filtered and combined into a signal with minimized power. Two test signal sets containing a high-powered jammer signal and a noise floor are used for performance evaluation. Three goals are set for this thesis; comparison of RLS to Sample Matrix Inversion (SMI) algorithm when used in a beamformer, comparison of various architectures which implement RLS, and the implementation and test of one of the architectures for a Xilinx Virtex 6 XC6VLX240T-1 Field-Programmable Gate Array (FPGA) Simulations comparing RLS to SMI show that a beamformer using RLS performs the same as a beamformer using SMI for 3-5 antennas (channels) and 1-4 temporal taps in the STAP filter. Litterature review shows that conventional RLS is unsuitable for FPGA implementation due to numerical instability. Comparison of IQRD-RLS, FQRD-RLS and MCFQRD-RLS architectures which are claimed to be stable RLS variants, shows that IQRD-RLS is the least computationally expensive of the algorithms. IQRD-RLS is implemented using Givens rotations in a systolic array architecture. Floating point, fixed point and CORDIC-based Givens rotation algorithms are compared with regard to speed and area, and floating point is chosen. Hardware simulations reveal that the filter weights returned by IQRD-RLS exhibit a drift, and is not stable in finite-precision arithmetic. The main cause is accumulated quantization error from the forgetting factor and its inverse (λ^(+-1/2)). The IQRD-RLS systolic array is reduced to a (stable) QRD-RLS systolic array, approximately halving the number of systolic array nodes. Filter weights are not computed directly by QRD-RLS, and are instead recovered from the QRD-RLS least squares filtering error output by the method of weight flushing. Results show that the QRD-RLS systolic array using 14 mantissa bits is sufficient as it performs equivalently to conventional RLS using double precision (53 mantissa bits). If only 11 mantissa bits are used, the output power increases by 3.3 dB. The final design can operate at sample rates from 19.4 MHz to 24.6 MHz, for a mantissa precision range of 14 to 11 bits. At this rate, the QRD-RLS systolic array can converge and output filter weights in 5.3 µs, significantly faster than the target of 100 µs. It is found that the current design has fully utilized its speed potential/limit due to the recursive nature of the algorithm. Processing of signals at the desired rate of 125 MHz would require changes to the algorithm itself. The implementation size is such that a 5-channel QRD-RLS array with one tap can fit on the FPGA. Channel-interleaving is proposed as a method to reduce system size, at the expense of slower operation. All hardware is designed, simulated and tested using Simulink together with Xilinx System Generator and its co-simulation and hardware-in-the-loop features
    corecore