164 research outputs found

    Complexity Analysis of MMSE Detector Architectures for MIMO OFDM Systems

    Get PDF
    In this paper, a field programmable gate array (FPGA) implementation of a linear minimum mean square error (LMMSE) detector is considered for MIMO-OFDM systems. Two square root free algorithms based on QR decomposition (QRD) are introduced for the implementation of LMMSE detector. Both algorithms are based on QRD via Givens rotations, namely coordinate rotation digital computer (CORDIC) and squared Givens rotation (SGR) algorithms. Linear and triangular shaped array architectures are considered to exploit the parallelism in the computations. An FPGA hardware implementation is presented and computational complexity of each implementation is evaluated and compared.ElekrobitNokiaTexas InstrumentsNational Technology Agency of FinlandTeke

    High-Throughput FPGA Implementation of QR Decomposition

    Get PDF
    Munoz, S.D.; Hormigo, J. "High-Throughput FPGA Implementation of QR Decomposition" IEEE Transactions on in Circuits and Systems II: Express Briefs,vol.62, no.9, pp.861-865, Sept. 2015 doi: 10.1109/TCSII.2015.2435753This brief presents a hardware design to achieve high-throughput QR decomposition, using Givens Rotation Method. It utilizes a new two-dimensional systolic array architecture with pipelined processing elements, which are based on the COordinate Rotation DIgital Computer (CORDIC) algorithm. CORDIC computes vector rotations through shifts and additions. This approach allows a continuous computation of QR factorizations with simple hardware. A fixed-point FPGA architecture for 4 x 4 matrices has been optimized by balancing the number of CORDIC iterations with the final error. As a result, compared to other previous proposals for FPGA, our design achieves at least 50% more throughput, and much less resource utilization.Ministry of Education and Science of Spain and Junta of Andalucia under contracts TIN2013-42253-P and P07-TIC-02630, respectively

    Improving Fixed-Point Implementation of QR Decomposition by Rounding-to-Nearest

    Get PDF
    QR decomposition is a key operation in many current communication systems. This paper shows how to reduce the area of a fixed-point QR decomposition implementation based on Givens rotations by using a new number representation system. This new representation allows performing round-tonearest at the same cost of truncation. Consequently, the rounding errors of the results are halved, which allows it to reduce the word-length by one bit. This reduction positively impacts on the area, delay and power consumption of the design.Ministry of Education and Science of Spain and Junta of Andalucía under contracts TIN2013-42253-P and TIC-1692, respectively, and Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Efficient floating-point givens rotation unit

    Get PDF
    This is a post-peer-review, pre-copyedit version of an article published in Circuits, Systems, and Signal Processing.High-throughput QR decomposition is a key operation in many advanced signal processing and communication applications. For some of these applications, using floating-point computation is becoming almost compulsory. However, there are scarce works in hardware implementations of floating-point QR decomposition for embedded systems. In this paper, we propose a very efficient high-throughput floating-point Givens rotation unit for QR decomposition. Moreover, the initial proposed design for conventional number formats is enhanced by using the new Half-Unit Biased format. The provided error analysis shows the effectiveness of our proposals and the trade-off of different implementation parameters. We also present FPGA implementation results and a thorough comparison between both approaches. These implementation results also reveal outstanding improvements compared to other previous similar designs in terms of area, latency, and throughput.This work was supported in part by following Spanish projects: TIN2016-80920-R, and JA2012 P12-TIC-169

    Using reconfigurable computing technology to accelerate matrix decomposition and applications

    Get PDF
    Matrix decomposition plays an increasingly significant role in many scientific and engineering applications. Among numerous techniques, Singular Value Decomposition (SVD) and Eigenvalue Decomposition (EVD) are widely used as factorization tools to perform Principal Component Analysis for dimensionality reduction and pattern recognition in image processing, text mining and wireless communications, while QR Decomposition (QRD) and sparse LU Decomposition (LUD) are employed to solve the dense or sparse linear system of equations in bioinformatics, power system and computer vision. Matrix decompositions are computationally expensive and their sequential implementations often fail to meet the requirements of many time-sensitive applications. The emergence of reconfigurable computing has provided a flexible and low-cost opportunity to pursue high-performance parallel designs, and the use of FPGAs has shown promise in accelerating this class of computation. In this research, we have proposed and implemented several highly parallel FPGA-based architectures to accelerate matrix decompositions and their applications in data mining and signal processing. Specifically, in this dissertation we describe the following contributions: • We propose an efficient FPGA-based double-precision floating-point architecture for EVD, which can efficiently analyze large-scale matrices. • We implement a floating-point Hestenes-Jacobi architecture for SVD, which is capable of analyzing arbitrary sized matrices. • We introduce a novel deeply pipelined reconfigurable architecture for QRD, which can be dynamically configured to perform either Householder transformation or Givens rotation in a manner that takes advantage of the strengths of each. • We design a configurable architecture for sparse LUD that supports both symmetric and asymmetric sparse matrices with arbitrary sparsity patterns. • By further extending the proposed hardware solution for SVD, we parallelize a popular text mining tool-Latent Semantic Indexing with an FPGA-based architecture. • We present a configurable architecture to accelerate Homotopy l1-minimization, in which the modification of the proposed FPGA architecture for sparse LUD is used at its core to parallelize both Cholesky decomposition and rank-1 update. Our experimental results using an FPGA-based acceleration system indicate the efficiency of our proposed novel architectures, with application and dimension-dependent speedups over an optimized software implementation that range from 1.5ÃÂ to 43.6ÃÂ in terms of computation time

    A CORDIC based QR Decomposition Technique for MIMO Detection

    Get PDF
    CORDIC based improved real and complex QR Decomposition (QRD) for channel pre-processing operations in (Multiple-Input Multiple-Output) MIMO detectors are presented in this paper. The proposed design utilizes pipelining and parallel processing techniques and reduces the latency and hardware complexity of the module respectively. Computational complexity analysis report shows the superiority of our module by 16% compared to literature. The implementation results reveal that the proposed QRD takes shorter latency compared to literature. The power consumption of 2x2 real channel matrix and 2x2 complex channel matrix was found to be 12mW and 44mW respectively on the state-of-the-art Xilinx Virtex 5 FPGA

    FPGA-Based Co-processor for Singular Value Array Reconciliation Tomography

    Get PDF
    This thesis describes a co-processor system that has been designed to accelerate computations associated with Singular Value Array Reconciliation Tomography (SART), a method for locating a wide-band RF source which may be positioned within an indoor environment, where RF propagation characteristics make source localization very challenging. The co-processor system is based on field programmable gate array (FPGA) technology, which offers a low-cost alternative to customized integrated circuits, while still providing the high performance, low power, and small size associated with a custom integrated solution. The system has been developed in VHDL, and implemented on a Virtex-4 SX55 FPGA development platform. The system is easy to use, and may be accessed through a C program or MATLAB script. Compared to a Pentium 4 CPU running at 3 GHz, use of the co-processor system provides a speed-up of about 6 times for the current signal matrix size of 128-by-16. Greater speed-ups may be obtained by using multiple devices in parallel. The system is capable of computing the SART metric to an accuracy of about -145 dB with respect to its true value. This level of accuracy, which is shown to be better than that obtained using single precision floating point arithmetic, allows even relatively weak signals to make a meaningful contribution to the final SART solution

    Implementation of a Real-Time Beamforming System on Field Programmable Gate Array

    Get PDF
    Beamforming is an important technique in array signal processing and wireless communication systems. In this project, we investigate the Minimum Variance Distortionless Response (MVDR) beamforming technique and its implementation. The QR-RLS algorithm is chosen because of its advantages of numerical stability and systolic array architecture. The team successfully implemented the real-time beamforming of a linear array with 3 receiving antennas on a Xilinx Virtex-5 FPGA platform. Both the simulation and hardware implementation results are presented in this report
    • …
    corecore