8,293 research outputs found

    Low power pipelined FFT processor architecture on FPGA

    Get PDF
    Fast Fourier Transform (FFT) processor is the hardware implementation for FFT algorithms for Discrete Fourier Transform (DFT) which compute any signal in time domain to frequency domain. This processor plays an important role in many applications such as digital video broadcasting, wireless sensor network and many more digital signal processing applications, which requires a small area and low power processor. Pipelined FFT processor design on FPGA will speed up the design process and flexibility. This paper provides a survey of three types of pipelined FFT architecture, radix-8, radix-4 single path feedback (R4SDF) and radix-4 single-pasth delay commutator implemented on FPGA. The simulation part is done via Modelsim and verification through Matlab. While the implementation is done via Quartus on the Altera Cyclone IV FPGA board. The performance of these FFT processor is studied. The result shows that radix-8 pipelined FFT have higher power dissipation compared to R4SDF and R4SDC, however R4SDC design has low area design compared to the rest. Overall, all pipelined FFT processor designs are functioning accordingly

    Implementing FFT-based digital channelized receivers on FPGA platforms

    Get PDF
    This paper presents an in-depth study of the implementation and characterization of fast Fourier transform (FFT) pipelined architectures suitable for broadband digital channelized receivers. When implementing the FFT algorithm on field-programmable gate array (FPGA) platforms, the primary goal is to maximize throughput and minimize area. Feedback and feedforward architectures have been analyzed regarding key design parameters: radix, bitwidth, number of points and stage scaling. Moreover, a simplification of the FFT algorithm, the monobit FFT, has been implemented in order to achieve faster real time performance in broadband digital receivers. The influence of the hardware implementation on the performance of digital channelized receivers has been analyzed in depth, revealing interesting implementation trade-offs which should be taken into account when designing this kind of signal processing systems on FPGA platforms

    Efficient FPGA implementation of high-throughput mixed radix multipath delay commutator FFT processor for MIMO-OFDM

    Get PDF
    This article presents and evaluates pipelined architecture designs for an improved high-frequency Fast Fourier Transform (FFT) processor implemented on Field Programmable Gate Arrays (FPGA) for Multiple Input Multiple Output Orthogonal Frequency Division Multiplexing (MIMO-OFDM). The architecture presented is a Mixed-Radix Multipath Delay Commutator. The presented parallel architecture utilizes fewer hardware resources compared to Radix-2 architecture, while maintaining simple control and butterfly structures inherent to Radix-2 implementations. The high-frequency design presented allows enhancing system throughput without requiring additional parallel data paths common in other current approaches, the presented design can process two and four independent data streams in parallel and is suitable for scaling to any power of two FFT size N. FPGA implementation of the architecture demonstrated significant resource efficiency and high-throughput in comparison to relevant current approaches within literature. The proposed architecture designs were realized with Xilinx System Generator (XSG) and evaluated on both Virtex-5 and Virtex-7 FPGA devices. Post place and route results demonstrated maximum frequency values over 400 MHz and 470 MHz for Virtex-5 and Virtex-7 FPGA devices respectively

    FPGA implementation of a 10 GS/s variable-length FFT for OFDM-based optical communication systems

    Full text link
    [EN] The transmission rate in current passive optical networks can be increased by employing Orthogonal Frequency Division Multiplexing (OFDM) modulation. The computational kernel of this modulation is the fast Fourier transform (FFT) operator, which has to achieve a very high throughput in order to be used in optical networks. This paper presents the implementation in an FPGA device of a variable-length FFT that can be configured in run-time to compute different FFT lengths between 16 and 1024 points. The FFT reaches a throughput of 10 GS/s in a Virtex-7 485T-3 FPGA device and was used to implement a 20 Gb/s optical OFDM receiver. (C) 2018 Elsevier B.V. All rights reserved.This work was supported by the Spanish Ministerio de Economia y Competitividad under project TEC2015-70858-C2-2-R with FEDER funds.Bruno, JS.; Almenar Terre, V.; Valls Coquillat, J. (2019). FPGA implementation of a 10 GS/s variable-length FFT for OFDM-based optical communication systems. Microprocessors and Microsystems. 64:195-204. https://doi.org/10.1016/j.micpro.2018.12.002S1952046

    Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework

    Full text link
    Hardware accelerations of deep learning systems have been extensively investigated in industry and academia. The aim of this paper is to achieve ultra-high energy efficiency and performance for hardware implementations of deep neural networks (DNNs). An algorithm-hardware co-optimization framework is developed, which is applicable to different DNN types, sizes, and application scenarios. The algorithm part adopts the general block-circulant matrices to achieve a fine-grained tradeoff between accuracy and compression ratio. It applies to both fully-connected and convolutional layers and contains a mathematically rigorous proof of the effectiveness of the method. The proposed algorithm reduces computational complexity per layer from O(n2n^2) to O(nlognn\log n) and storage complexity from O(n2n^2) to O(nn), both for training and inference. The hardware part consists of highly efficient Field Programmable Gate Array (FPGA)-based implementations using effective reconfiguration, batch processing, deep pipelining, resource re-using, and hierarchical control. Experimental results demonstrate that the proposed framework achieves at least 152X speedup and 71X energy efficiency gain compared with IBM TrueNorth processor under the same test accuracy. It achieves at least 31X energy efficiency gain compared with the reference FPGA-based work.Comment: 6 figures, AAAI Conference on Artificial Intelligence, 201
    corecore