36 research outputs found

    Low-power Programmable Processor for Fast Fourier Transform Based on Transport Triggered Architecture

    Get PDF
    This paper describes a low-power processor tailored for fast Fourier transform computations where transport triggering template is exploited. The processor is software-programmable while retaining an energy-efficiency comparable to existing fixed-function implementations. The power savings are achieved by compressing the computation kernel into one instruction word. The word is stored in an instruction loop buffer, which is more power-efficient than regular instruction memory storage. The processor supports all power-of-two FFT sizes from 64 to 16384 and given 1 mJ of energy, it can compute 20916 transforms of size 1024.Comment: 5 pages, 4 figures, 1 table, ICASSP 2019 conferenc

    Implementing FFT-based digital channelized receivers on FPGA platforms

    Get PDF
    This paper presents an in-depth study of the implementation and characterization of fast Fourier transform (FFT) pipelined architectures suitable for broadband digital channelized receivers. When implementing the FFT algorithm on field-programmable gate array (FPGA) platforms, the primary goal is to maximize throughput and minimize area. Feedback and feedforward architectures have been analyzed regarding key design parameters: radix, bitwidth, number of points and stage scaling. Moreover, a simplification of the FFT algorithm, the monobit FFT, has been implemented in order to achieve faster real time performance in broadband digital receivers. The influence of the hardware implementation on the performance of digital channelized receivers has been analyzed in depth, revealing interesting implementation trade-offs which should be taken into account when designing this kind of signal processing systems on FPGA platforms

    Advanced Quantization Schemes to Increase Accuracy, Reduce Area, and Lower Power Consumption in FFT Architectures.

    Get PDF
    This paper explores new advanced quantization schemes for fast Fourier transform (FFT) architectures. In previous works, FFT quantization has been treated theoretically or with the sole aim of improving accuracy. In this work, we go one step beyond by considering also the implications that quantization schemes have on the area and power consumption of the architecture. To achieve this, we have analyzed the mathematical operations carried out in FFT architectures and explored the changes that benefit all the figures of merit. By combining or alternating truncation and rounding, and using the half-unit biased (HUB) representation in the different computations of the architecture, we have achieved quantization schemes that increase accuracy, reduce area, and lower power consumption simultaneously. This win-win result improves multiple figures of merit without worsening any other, making it a valuable strategy to optimize FFT architectures.MCIN/AEI/10.13039/501100011033 and “ERDF A Way of Making Europe” under Project PID2021-126991NA-I00, European Union NextGeneration EU/PRTR under Project TED2021-131527B-I00, Fondo Europeo de Desarrollo Regional under Grant UMA20-FEDERJA-059, and MCIN/AEI/10.13039/501100011033 and “ESF Investing in Your Future” under Grant RYC2018-025384-

    Studio e realizzazione di un'architettura VLSI di un processore per l'implementazione dell'algoritmo FFT

    Get PDF
    Poiché lo standard di connessione 5G è utilizzato da un numero sempre crescente di dispositivi e si sta evolvendo per soddisfare nuove esigenze e requisiti, è diventato fondamentale studiare e progettare nuovi trasmettitori e ricevitori più veloci ed efficienti. Un ruolo fondamentale nella connessione 5G è svolto dal multiplexing a divisione di frequenza ortogonale (OFDM), una metodologia di modulazione. Poiché la demodulazione è basata sulla trasformata di Fourier, lo scopo di questa tesi è realizzare un processore in grado di implementare algoritmi FFT e DFT su sequenze di lunghezza variabile che rispetti i criteri dello standard 5G. Per fare ciò, è stata prima condotta un'analisi del rapporto dell'Unione internazionale delle telecomunicazioni ITU-R M.2410-0 per definire i requisiti minimi per il processore. Successivamente, uno studio dello stato dell'arte per dispositivi simili ha portato allo sviluppo di un'architettura VLSI adatta all'applicazione. Una versione RTL dell'architettura è stata implementata in VHDL e testata.Since the 5G connection standard is utilized by a rising number of devices and is evolving to meet new needs and requirements, it has become crucial to study and design new, faster, and more efficient transmitters and receivers. A fundamental role in the 5G connection is played by Orthogonal frequency-division multiplexing (OFDM), an encoding methodology. Since the demodulation is based on the Fourier Transform, the purpose of this thesis is to realize a processor capable of implementing FFT and DFT algorithms on variable length sequences that complies with the 5G standard criteria. In order to do so, first an analysis of the International Telecommunication Union report ITU-R M.2410-0 has been conducted to define the minimum requirements for the processor. Then, a study of the state of the art for similar devices led to the development of a VLSI architecture suitable for the application. An RTL version of the architecture has been implemented in VHDL and tested

    On-board demux/demod

    Get PDF
    To make satellite channels cost competitive with optical cables, the use of small, inexpensive earth stations with reduced antenna size and high powered amplifier (HPA) power will be needed. This will necessitate the use of high e.i.r.p. and gain-to-noise temperature ratio (G/T) multibeam satellites. For a multibeam satellite, onboard switching is required in order to maintain the needed connectivity between beams. This switching function can be realized by either an receive frequency (RF) or a baseband unit. The baseband switching approach has the additional advantage of decoupling the up-link and down-link, thus enabling rate and format conversion as well as improving the link performance. A baseband switching satellite requires the demultiplexing and demodulation of the up-link carriers before they can be switched to their assigned down-link beams. Principles of operation, design and implementation issues of such an onboard demultiplexer/demodulator (bulk demodulator) that was recently built at COMSAT Labs. are discussed

    Design of High Speed Memory-Based FFT Processor Using 90nm Technology

    Get PDF
    In order to enhance performance, the Fast Fourier Transformation is a important operation in Digital Signal Processing (DSP) systems had been extensively studied. State-of-the-art transmission technology uses Orthogonal frequency division multiplexing (OFDM), which primary operation is the Fast fourier transform (FFT). This analysis presents the design of a high-speed memory-based FFT processor using 90nm technology. The novel hybrid multiplier and hybrid adder is used in this analysis. The main objective of this method is to develop an efficient, memory-efficient FFT processor that requires less area.  Using 90nm CMOS (Complementary Metal Oxide Semiconductor) technology, the proposed FFT processor was created and implemented in process. With reduced processing time, this means that the proposed FFT processor performs better than the prior memory-based FFT processors in terms of performance and the number of LUTs required which reduces area and memory utilization

    Architectural implementation of cordic unit and its applications

    Get PDF
    The ubiquity of DSP has made increasing demand to develop area efficient and accurate architectures in carrying out many nonlinear arithmetic operations. One such architecture is CORDIC unit which has many applications in the field of DSP including implementing transforms based on Fourier basis. This report presents architecture of CORDIC, embedded with a scaling unit that has only minimal number of adders and shifters. It can be implemented in rotation mode as well as vectoring mode. The purpose of the design is to get a scaling free CORDIC unit preserving the design of original algorithm. The proposed design has a considerable reduction in hardware when compared with other scaling free architectures. The analysis of error for different word lengths and different input ranges for fixed word length gives a better choice to choose the parameters. The error in rotation mode for 16 bit data path, obtained for Y equivalent input is 0.073% and for X equivalent input is 0.067%. We also report architecture of a DFT core that is implemented using low latency CORDIC. A scaling unit has been included to get scaled outputs. The reported DFT core architecture has 22 adders in total, in addition to 2 CORDIC units. DDS or NCO are nowadays prominently used in the applications of RF signal processing, satellite communications, etc. This report also brings out the FPGA implementation of one such DDS which has quadrature outputs. The proposed DDS design, which is based on pipelined CORDIC, has considerable improvement in terms of SFDR compared to other existing designs at reduced hardware. This report also proposes multiplier-less architecture for the implementation of radix-2^2 folded pipelined complex FFT core based on CORDIC technique. The number of points considered in the work is sixteen and the folding is done by a factor of four

    Fast Fourier transforms on energy-efficient application-specific processors

    Get PDF
    Many of the current applications used in battery powered devices are from digital signal processing, telecommunication, and multimedia domains. Traditionally application-specific fixed-function circuits have been used in these designs in form of application-specific integrated circuits (ASIC) to reach the required performance and energy-efficiency. The complexity of these applications has increased over the years, thus the design complexity has increased even faster, which implies increased design time. At the same time, there are more and more standards to be supported, thus using optimised fixed-function implementations for all the functions in all the standards is impractical. The non-recurring engineering costs for integrated circuits have also increased significantly, so manufacturers can only afford fewer chip iterations. Although tailoring the circuit for a specific application provides the best performance and/or energy-efficiency, such approach lacks flexibility. E.g., if an error is found after the manufacturing, an expensive chip iteration is required. In addition, new functionalities cannot be added afterwards to support evolution of standards. Flexibility can be obtained with software based implementation technologies. Unfortunately, general-purpose processors do not provide the energy-efficiency of the fixed-function circuit designs. A useful trade-off between flexibility and performance is implementation based on application-specific processors (ASP) where programmability provides the flexibility and computational resources customised for the given application provide the performance. In this Thesis, application-specific processors are considered by using fast Fourier transform as the representative algorithm. The architectural template used here is transport triggered architecture (TTA) which resembles very long instruction word machines but the operand execution resembles data flow machines rather than traditional operand triggering. The developed TTA processors exploit inherent parallelism of the application. In addition, several characteristics of the application have been identified and those are exploited by developing customised functional units for speeding up the execution. Several customisations are proposed for the data path of the processor but it is also important to match the memory bandwidth to the computation speed. This calls for a memory organisation supporting parallel memory accesses. The proposed optimisations have been used to improve the energy-efficiency of the processor and experiments show that a programmable solution can have energy-efficiency comparable to fixed-function ASIC designs

    Serial-data computation in VLSI

    Get PDF
    corecore