59 research outputs found

    High-Throughput FPGA Implementation of QR Decomposition

    Get PDF
    Munoz, S.D.; Hormigo, J. "High-Throughput FPGA Implementation of QR Decomposition" IEEE Transactions on in Circuits and Systems II: Express Briefs,vol.62, no.9, pp.861-865, Sept. 2015 doi: 10.1109/TCSII.2015.2435753This brief presents a hardware design to achieve high-throughput QR decomposition, using Givens Rotation Method. It utilizes a new two-dimensional systolic array architecture with pipelined processing elements, which are based on the COordinate Rotation DIgital Computer (CORDIC) algorithm. CORDIC computes vector rotations through shifts and additions. This approach allows a continuous computation of QR factorizations with simple hardware. A fixed-point FPGA architecture for 4 x 4 matrices has been optimized by balancing the number of CORDIC iterations with the final error. As a result, compared to other previous proposals for FPGA, our design achieves at least 50% more throughput, and much less resource utilization.Ministry of Education and Science of Spain and Junta of Andalucia under contracts TIN2013-42253-P and P07-TIC-02630, respectively

    Improving Fixed-Point Implementation of QR Decomposition by Rounding-to-Nearest

    Get PDF
    QR decomposition is a key operation in many current communication systems. This paper shows how to reduce the area of a fixed-point QR decomposition implementation based on Givens rotations by using a new number representation system. This new representation allows performing round-tonearest at the same cost of truncation. Consequently, the rounding errors of the results are halved, which allows it to reduce the word-length by one bit. This reduction positively impacts on the area, delay and power consumption of the design.Ministry of Education and Science of Spain and Junta of Andalucía under contracts TIN2013-42253-P and TIC-1692, respectively, and Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    FPGA-Based Co-processor for Singular Value Array Reconciliation Tomography

    Get PDF
    This thesis describes a co-processor system that has been designed to accelerate computations associated with Singular Value Array Reconciliation Tomography (SART), a method for locating a wide-band RF source which may be positioned within an indoor environment, where RF propagation characteristics make source localization very challenging. The co-processor system is based on field programmable gate array (FPGA) technology, which offers a low-cost alternative to customized integrated circuits, while still providing the high performance, low power, and small size associated with a custom integrated solution. The system has been developed in VHDL, and implemented on a Virtex-4 SX55 FPGA development platform. The system is easy to use, and may be accessed through a C program or MATLAB script. Compared to a Pentium 4 CPU running at 3 GHz, use of the co-processor system provides a speed-up of about 6 times for the current signal matrix size of 128-by-16. Greater speed-ups may be obtained by using multiple devices in parallel. The system is capable of computing the SART metric to an accuracy of about -145 dB with respect to its true value. This level of accuracy, which is shown to be better than that obtained using single precision floating point arithmetic, allows even relatively weak signals to make a meaningful contribution to the final SART solution

    Efficient floating-point givens rotation unit

    Get PDF
    This is a post-peer-review, pre-copyedit version of an article published in Circuits, Systems, and Signal Processing.High-throughput QR decomposition is a key operation in many advanced signal processing and communication applications. For some of these applications, using floating-point computation is becoming almost compulsory. However, there are scarce works in hardware implementations of floating-point QR decomposition for embedded systems. In this paper, we propose a very efficient high-throughput floating-point Givens rotation unit for QR decomposition. Moreover, the initial proposed design for conventional number formats is enhanced by using the new Half-Unit Biased format. The provided error analysis shows the effectiveness of our proposals and the trade-off of different implementation parameters. We also present FPGA implementation results and a thorough comparison between both approaches. These implementation results also reveal outstanding improvements compared to other previous similar designs in terms of area, latency, and throughput.This work was supported in part by following Spanish projects: TIN2016-80920-R, and JA2012 P12-TIC-169

    A CORDIC based QR Decomposition Technique for MIMO Detection

    Get PDF
    CORDIC based improved real and complex QR Decomposition (QRD) for channel pre-processing operations in (Multiple-Input Multiple-Output) MIMO detectors are presented in this paper. The proposed design utilizes pipelining and parallel processing techniques and reduces the latency and hardware complexity of the module respectively. Computational complexity analysis report shows the superiority of our module by 16% compared to literature. The implementation results reveal that the proposed QRD takes shorter latency compared to literature. The power consumption of 2x2 real channel matrix and 2x2 complex channel matrix was found to be 12mW and 44mW respectively on the state-of-the-art Xilinx Virtex 5 FPGA

    Architectures for adaptive weight calculation on ASIC and FPGA

    Get PDF

    Implementation of a Real-Time Beamforming System on Field Programmable Gate Array

    Get PDF
    Beamforming is an important technique in array signal processing and wireless communication systems. In this project, we investigate the Minimum Variance Distortionless Response (MVDR) beamforming technique and its implementation. The QR-RLS algorithm is chosen because of its advantages of numerical stability and systolic array architecture. The team successfully implemented the real-time beamforming of a linear array with 3 receiving antennas on a Xilinx Virtex-5 FPGA platform. Both the simulation and hardware implementation results are presented in this report

    Efficient arithmetic for high speed DSP implementation on FPGAs

    Get PDF
    The author was sponsored by EnTegra Ltd, a company who develop hardware and software products and services for the real time implementation of DSP and RF systems. The field programmable gate array (FPGA) is being used increasingly in the field of DSP. This is due to the fact that the parallel computing power of such devices is ideal for today’s truly demanding DSP algorithms. Algorithms such as the QR-RLS update are computationally intensive and must be carried out at extremely high speeds (MHz). This means that the DSP processor is simply not an option. ASICs can be used but the expense of developing custom logic is prohibitive. The increased use of the FPGA in DSP means that there is a significant requirement for efficient arithmetic cores that utilises the resources on such devices. This thesis presents the research and development effort that was carried out to produce fixed point division and square root cores for use in a new Electronic Design Automation (EDA) tool for EnTegra, which is targeted at FPGA implementation of DSP systems. Further to this, a new technique for predicting the accuracy of CORDIC systems computing vector magnitudes and cosines/sines is presented. This work allows the most efficient CORDIC design for a specified level of accuracy to be found quickly and easily without the need to run lengthy simulations, as was the case before. The CORDIC algorithm is a technique using mainly shifts and additions to compute many arithmetic functions and is thus ideal for FPGA implementation

    Hardware implementation of multiple-input multiple-output transceiver for wireless communication

    Get PDF
    This dissertation proposes an efficient hardware implementation scheme for iterative multi-input multi-output orthogonal frequency-division multiplexing (MIMO-OFDM) transceiver. The transmitter incorporates linear precoder designed with instantaneous channel state information (CSI). The receiver implements MMSE-IC (minimum mean square error interference cancelation) detector, channel estimator, low-density parity-check (LDPC) decoder and other supporting modules. The proposed implementation uses QR decomposition (QRD) of complex-valued matrices with four co-ordinate rotation digital computer (CORDIC) cores and back substitution to achieve the best tradeoff between resource and throughput. The MIMO system is used in field test and the results indicate that the instantaneous CSI varies very fast in practices and the performance of linear precoder designed with instantaneous CSI is limited. Instead, statistic CSI had to be used. This dissertation also proposes a higher-rank principle Kronecker model (PKM). That exploits the statistic CSI to simulate the fading channels. The PKM is constructed by decomposing the channel correlation matrices with the higher-order singular value decomposition (HOSVD) method. The proposed PKM-HOSVD model is validated by extensive field experiments conducted for 4-by-4 MIMO systems in both indoor and outdoor environments. The results confirm that the statistic CSI varies slowly and the PKM-HOSVD will be helpful in the design of linear precoders. --Abstract, page iv
    corecore