227 research outputs found

    Efficient floating-point givens rotation unit

    Get PDF
    This is a post-peer-review, pre-copyedit version of an article published in Circuits, Systems, and Signal Processing.High-throughput QR decomposition is a key operation in many advanced signal processing and communication applications. For some of these applications, using floating-point computation is becoming almost compulsory. However, there are scarce works in hardware implementations of floating-point QR decomposition for embedded systems. In this paper, we propose a very efficient high-throughput floating-point Givens rotation unit for QR decomposition. Moreover, the initial proposed design for conventional number formats is enhanced by using the new Half-Unit Biased format. The provided error analysis shows the effectiveness of our proposals and the trade-off of different implementation parameters. We also present FPGA implementation results and a thorough comparison between both approaches. These implementation results also reveal outstanding improvements compared to other previous similar designs in terms of area, latency, and throughput.This work was supported in part by following Spanish projects: TIN2016-80920-R, and JA2012 P12-TIC-169

    A CORDIC based QR Decomposition Technique for MIMO Detection

    Get PDF
    CORDIC based improved real and complex QR Decomposition (QRD) for channel pre-processing operations in (Multiple-Input Multiple-Output) MIMO detectors are presented in this paper. The proposed design utilizes pipelining and parallel processing techniques and reduces the latency and hardware complexity of the module respectively. Computational complexity analysis report shows the superiority of our module by 16% compared to literature. The implementation results reveal that the proposed QRD takes shorter latency compared to literature. The power consumption of 2x2 real channel matrix and 2x2 complex channel matrix was found to be 12mW and 44mW respectively on the state-of-the-art Xilinx Virtex 5 FPGA

    An approach to the application of shift-and-add algorithms on engineering and industrial processes

    Get PDF
    Different kinds of algorithms can be chosen so as to compute elementary functions. Among all of them, it is worthwhile mentioning the shift-and-add algorithms due to the fact that they have been specifically designed to be very simple and to save computer resources. In fact, almost the only operations usually involved with these methods are additions and shifts, which can be easily and efficiently performed by a digital processor. Shift-and-add algorithms allow fairly good precision with low cost iterations. The most famous algorithm belonging to this type is CORDIC. CORDIC has the capability of approximating a wide variety of functions with only the help of a slight change in their iterations. In this paper, we will analyze the requirements of some engineering and industrial problems in terms of type of operands and functions to approximate. Then, we will propose the application of shift-and-add algorithms based on CORDIC to these problems. We will make a comparison between the different methods applied in terms of the precision of the results and the number of iterations required.This research was supported by the Conselleria de Educacion of the Valencia Region Government under grant number GV/2011/043

    Efficient arithmetic for high speed DSP implementation on FPGAs

    Get PDF
    The author was sponsored by EnTegra Ltd, a company who develop hardware and software products and services for the real time implementation of DSP and RF systems. The field programmable gate array (FPGA) is being used increasingly in the field of DSP. This is due to the fact that the parallel computing power of such devices is ideal for today’s truly demanding DSP algorithms. Algorithms such as the QR-RLS update are computationally intensive and must be carried out at extremely high speeds (MHz). This means that the DSP processor is simply not an option. ASICs can be used but the expense of developing custom logic is prohibitive. The increased use of the FPGA in DSP means that there is a significant requirement for efficient arithmetic cores that utilises the resources on such devices. This thesis presents the research and development effort that was carried out to produce fixed point division and square root cores for use in a new Electronic Design Automation (EDA) tool for EnTegra, which is targeted at FPGA implementation of DSP systems. Further to this, a new technique for predicting the accuracy of CORDIC systems computing vector magnitudes and cosines/sines is presented. This work allows the most efficient CORDIC design for a specified level of accuracy to be found quickly and easily without the need to run lengthy simulations, as was the case before. The CORDIC algorithm is a technique using mainly shifts and additions to compute many arithmetic functions and is thus ideal for FPGA implementation

    Error Analysis of CORDIC Processor with FPGA Implementation

    Full text link
    The coordinate rotation digital computer (CORDIC) is a shift-add based fast computing algorithm which has been found in many digital signal processing (DSP) applications. In this paper, a detailed error analysis based on mean square error criteria and its implementation on FPGA is presented. Two considered error sources are an angle approximation error and a quantization error due to finite word length in fixed-point number system. The error bound and variance are discussed in theory. The CORDIC algorithm is implemented on FPGA using the Xilinx Zynq-7000 development board called ZedBoard. Those results of theoretical error analysis are practically investigated by implementing it on actual FPGA board. In addition, Matlab is used to provide theoretical value as a baseline model by being set up in double-precision floating-point to compare it with the practical value of errors on FPGA implementation.Comment: 5 pages, 7 Figure

    Adaptive Beamforming Using the Recursive Least Squares Algorithm on an FPGA

    Get PDF
    This thesis describes the design and implementation of a five-channel beamformer using a Space-Time Adaptive Processing (STAP) filter with Recursive Least Squares (RLS) as the adaptive algorithm. The objective of the algorithm is to compute of a set of filter weights for a STAP filter, such that the channels are filtered and combined into a signal with minimized power. Two test signal sets containing a high-powered jammer signal and a noise floor are used for performance evaluation. Three goals are set for this thesis; comparison of RLS to Sample Matrix Inversion (SMI) algorithm when used in a beamformer, comparison of various architectures which implement RLS, and the implementation and test of one of the architectures for a Xilinx Virtex 6 XC6VLX240T-1 Field-Programmable Gate Array (FPGA) Simulations comparing RLS to SMI show that a beamformer using RLS performs the same as a beamformer using SMI for 3-5 antennas (channels) and 1-4 temporal taps in the STAP filter. Litterature review shows that conventional RLS is unsuitable for FPGA implementation due to numerical instability. Comparison of IQRD-RLS, FQRD-RLS and MCFQRD-RLS architectures which are claimed to be stable RLS variants, shows that IQRD-RLS is the least computationally expensive of the algorithms. IQRD-RLS is implemented using Givens rotations in a systolic array architecture. Floating point, fixed point and CORDIC-based Givens rotation algorithms are compared with regard to speed and area, and floating point is chosen. Hardware simulations reveal that the filter weights returned by IQRD-RLS exhibit a drift, and is not stable in finite-precision arithmetic. The main cause is accumulated quantization error from the forgetting factor and its inverse (λ^(+-1/2)). The IQRD-RLS systolic array is reduced to a (stable) QRD-RLS systolic array, approximately halving the number of systolic array nodes. Filter weights are not computed directly by QRD-RLS, and are instead recovered from the QRD-RLS least squares filtering error output by the method of weight flushing. Results show that the QRD-RLS systolic array using 14 mantissa bits is sufficient as it performs equivalently to conventional RLS using double precision (53 mantissa bits). If only 11 mantissa bits are used, the output power increases by 3.3 dB. The final design can operate at sample rates from 19.4 MHz to 24.6 MHz, for a mantissa precision range of 14 to 11 bits. At this rate, the QRD-RLS systolic array can converge and output filter weights in 5.3 µs, significantly faster than the target of 100 µs. It is found that the current design has fully utilized its speed potential/limit due to the recursive nature of the algorithm. Processing of signals at the desired rate of 125 MHz would require changes to the algorithm itself. The implementation size is such that a 5-channel QRD-RLS array with one tap can fit on the FPGA. Channel-interleaving is proposed as a method to reduce system size, at the expense of slower operation. All hardware is designed, simulated and tested using Simulink together with Xilinx System Generator and its co-simulation and hardware-in-the-loop features

    A Digital Integrated Inertial Navigation System For Aerial Vehicles

    Get PDF
    corecore