710 research outputs found

    Generating and Searching Families of FFT Algorithms

    Full text link
    A fundamental question of longstanding theoretical interest is to prove the lowest exact count of real additions and multiplications required to compute a power-of-two discrete Fourier transform (DFT). For 35 years the split-radix algorithm held the record by requiring just 4n log n - 6n + 8 arithmetic operations on real numbers for a size-n DFT, and was widely believed to be the best possible. Recent work by Van Buskirk et al. demonstrated improvements to the split-radix operation count by using multiplier coefficients or "twiddle factors" that are not n-th roots of unity for a size-n DFT. This paper presents a Boolean Satisfiability-based proof of the lowest operation count for certain classes of DFT algorithms. First, we present a novel way to choose new yet valid twiddle factors for the nodes in flowgraphs generated by common power-of-two fast Fourier transform algorithms, FFTs. With this new technique, we can generate a large family of FFTs realizable by a fixed flowgraph. This solution space of FFTs is cast as a Boolean Satisfiability problem, and a modern Satisfiability Modulo Theory solver is applied to search for FFTs requiring the fewest arithmetic operations. Surprisingly, we find that there are FFTs requiring fewer operations than the split-radix even when all twiddle factors are n-th roots of unity.Comment: Preprint submitted on March 28, 2011, to the Journal on Satisfiability, Boolean Modeling and Computatio

    FPGA Implementation of Fast Fourier Transform Core Using NEDA

    Get PDF
    Transforms like DFT are a major block in communication systems such as OFDM, etc. This thesis reports architecture of a DFT core using NEDA. The advantage of the proposed architecture is that the entire transform can be implemented using adder/subtractors and shifters only, thus minimising the hardware requirement compared to other architectures. The proposed design is implemented for 16-bit data path (12–bit for comparison) considering both integer representation as well as fixed point representation, thus increasing the scope of usage. The proposed design is mapped on to Xilinx XC2VP30 FPGA, which is fabricated using 130 nm process technology. The maximum on board frequency of operation of the proposed design is 122 MHz. NEDA is one of the techniques to implement many signal processing systems that require multiply and accumulate units. FFT is one of the most employed blocks in many communication and signal processing systems. The FPGA implementation of a 16 point radix-4 complex FFT is proposed. The proposed design has improvement in terms of hardware utilization compared to traditional methods. The design has been implemented on a range of FPGAs to compare the performance. The maximum frequency achieved is 114.27 MHz on XC5VLX330 FPGA and the maximum throughput, 1828.32 Mbit/s and minimum slice delay product, 9.18. The design is also implemented using synopsys DC synthesis in both 65 nm and 180 nm technology libraries. The advantages of multiplier-less architectures are reduced hardware and improved latency. The multiplier-less architectures for the implementation of radix-2^2 folded pipelined complex FFT core are based on NEDA. The number of points considered in the work is sixteen and the folding is done by a factor of four. The proposed designs are implemented on Xilinx XC5VSX240T FPGA. Proposed designs based on NEDA have reduced area over 83%. The observed slice-delay product for NEDA based designs are 2.196 and 5.735

    High performance DIF-FFT using dissimilar partitioned LUT based Distributed Arithmetic

    Get PDF
    Real-time data processing systems utilize Digital Signal Processing (DSP) functions as the base modules. Most of the DSP functions involve the implementation of Fast Fourier Transform (FFT) to convert the signals from one domain to another domain. The major bottleneck of Decimation in frequency-Fast Fourier Transform (DIF-FFT) implementation lies in using a number of Multipliers. Distributed arithmetic (DA) is considered as one of the efficient techniques to implement DIF-FFT. In this approach, the multipliers are not used. The proposed technique exploits the very advantage of the look-up table by storing the Twiddle factors, thereby avoiding the multipliers required in the butterfly structure. DIF-FFT using Distributed Arithmetic (DIF-FFT DA) models, with different adders such as Ripple carry adder (RCA), Carry-lookahead adder (CLA), and Sklansky prefix graph adder, are proposed in this paper. The three proposed models are synthesized using Cadence 6.1 EDA tools with a 45nm CMOS technology. Compared to the traditional method, it is observed that the area is improved by 53.11%, 53.35%, and 50.15%, power is improved by 42.31%, 42.52%, and 40.39%, and delay is improved by 45.26%, 45.42%, 41.80%, respectively

    Computing the fast Fourier transform on SIMD microprocessors

    Get PDF
    This thesis describes how to compute the fast Fourier transform (FFT) of a power-of-two length signal on single-instruction, multiple-data (SIMD) microprocessors faster than or very close to the speed of state of the art libraries such as FFTW (“Fastest Fourier Transform in the West”), SPIRAL and Intel Integrated Performance Primitives (IPP). The conjugate-pair algorithm has advantages in terms of memory bandwidth, and three implementations of this algorithm, which incorporate latency and spatial locality optimizations, are automatically vectorized at the algorithm level of abstraction. Performance results on 2- way, 4-way and 8-way SIMD machines show that the performance scales much better than FFTW or SPIRAL. The implementations presented in this thesis are compiled into a high-performance FFT library called SFFT (“Streaming Fast Fourier Trans- form”), and benchmarked against FFTW, SPIRAL, Intel IPP and Apple Accelerate on sixteen x86 machines and two ARM NEON machines, and shown to be, in many cases, faster than these state of the art libraries, but without having to perform extensive machine specific calibration, thus demonstrating that there are good heuristics for predicting the performance of the FFT on SIMD microprocessors (i.e., the need for empirical optimization may be overstated)

    Design of High Speed Memory-Based FFT Processor Using 90nm Technology

    Get PDF
    In order to enhance performance, the Fast Fourier Transformation is a important operation in Digital Signal Processing (DSP) systems had been extensively studied. State-of-the-art transmission technology uses Orthogonal frequency division multiplexing (OFDM), which primary operation is the Fast fourier transform (FFT). This analysis presents the design of a high-speed memory-based FFT processor using 90nm technology. The novel hybrid multiplier and hybrid adder is used in this analysis. The main objective of this method is to develop an efficient, memory-efficient FFT processor that requires less area.  Using 90nm CMOS (Complementary Metal Oxide Semiconductor) technology, the proposed FFT processor was created and implemented in process. With reduced processing time, this means that the proposed FFT processor performs better than the prior memory-based FFT processors in terms of performance and the number of LUTs required which reduces area and memory utilization

    DFT algorithms for bit-serial GaAs array processor architectures

    Get PDF
    Systems and Processes Engineering Corporation (SPEC) has developed an innovative array processor architecture for computing Fourier transforms and other commonly used signal processing algorithms. This architecture is designed to extract the highest possible array performance from state-of-the-art GaAs technology. SPEC's architectural design includes a high performance RISC processor implemented in GaAs, along with a Floating Point Coprocessor and a unique Array Communications Coprocessor, also implemented in GaAs technology. Together, these data processors represent the latest in technology, both from an architectural and implementation viewpoint. SPEC has examined numerous algorithms and parallel processing architectures to determine the optimum array processor architecture. SPEC has developed an array processor architecture with integral communications ability to provide maximum node connectivity. The Array Communications Coprocessor embeds communications operations directly in the core of the processor architecture. A Floating Point Coprocessor architecture has been defined that utilizes Bit-Serial arithmetic units, operating at very high frequency, to perform floating point operations. These Bit-Serial devices reduce the device integration level and complexity to a level compatible with state-of-the-art GaAs device technology

    Error analysis and complexity optimization for the multiplier-less FFT-like transformation (ML-FFT)

    Get PDF
    This paper studies the effect of the signal round-off errors on the accuracies of the multiplier-less Fast Fourier Transform-like transformation (ML-FFT). The idea of the ML-FFT is to parameterize the twiddle factors in the conventional FFT algorithm as certain rotation-like matrices and approximate the associated parameters inside these matrices by the sum-of-power-of-two (SOPOT) or canonical signed digits (CSD) representations. The error due to the SOPOT approximation is called the coefficient round-off error. Apart from this error, signal round-off error also occurs because of insufficient wordlengths. Using a recursive noise model of these errors, the minimum hardware to realize the ML-FFT subject to the prescribed output bit accuracy can be obtained using a random search algorithm. A design example is given to demonstrate the effectiveness of the proposed approach.published_or_final_versio

    A Study of Linear Approximation Techniques for SAR Azimuth Processing

    Get PDF
    The application of the step transform subarray processing techniques to synthetic aperture radar (SAR) was studied. The subarray technique permits the application of efficient digital transform computational techniques such as the fast Fourier transform to be applied while offering an effective tool for range migration compensation. Range migration compensation is applied at the subarray level, and with the subarray size based on worst case range migration conditions, a minimum control system is achieved. A baseline processor was designed for a four-look SAR system covering approximately 4096 by 4096 SAR sample field every 2.5 seconds. Implementation of the baseline system was projected using advanced low power technologies. A 20 swath is implemented with approximately 1000 circuits having a power dissipation of from 70 to 195 watts. The baseline batch step transform processor is compared to a continuous strip processor, and variations of the baseline are developed for a wide range of SAR parameters
    corecore