1,571 research outputs found

    The tangent FFT

    Get PDF
    The split-radix FFT computes a size-n complex DFT, when n is a large power of 2, using just arithmetic operations on real numbers. This operation count was first announced in 1968, stood unchallenged for more than thirty years, and was widely believed to be best possible. Recently James Van Buskirk posted software demonstrating that the split-radix FFT is not optimal. Van Buskirk’s software computes a size-n complex DFT using only arithmetic operations on real numbers. There are now three papers attempting to explain the improvement from 4 to 34/9: Johnson and Frigo, IEEE Transactions on Signal Processing, 2007; Lundy and Van Buskirk, Computing, 2007; and this paper. This paper presents the "tangent FFT," a straightforward in-place cache-friendly DFT algorithm having exactly the same operation counts as Van Buskirk’s algorithm. This paper expresses the tangent FFT as a sequence of standard polynomial operations, and pinpoints how the tangent FFT saves time compared to the split-radix FFT. This description is helpful not only for understanding and analyzing Van Buskirk’s improvement but also for minimizing the memory-access costs of the FFT

    The tangent fft

    Get PDF
    Abstract. The split-radix FFT computes a size-n complex DFT, when n is a large power of 2, using just 4n lg n−6n+8 arithmetic operations on real numbers. This operation count was first announced in 1968, stood unchallenged for more than thirty years, and was widely believed to be best possible. Recently James Van Buskirk posted software demonstrating that the split-radix FFT is not optimal. Van Buskirk's software computes a sizen complex DFT using only (34/9 + o(1))n lg n arithmetic operations on real numbers. There are now three papers attempting to explain the improvement from 4 to 34/9: Johnson and Frigo, IEEE Transactions on Signal Processing, 2007; Lundy and Van Buskirk, Computing, 2007; and this paper. This paper presents the "tangent FFT," a straightforward in-place cache-friendly DFT algorithm having exactly the same operation counts as Van Buskirk's algorithm. This paper expresses the tangent FFT as a sequence of standard polynomial operations, and pinpoints how the tangent FFT saves time compared to the split-radix FFT. This description is helpful not only for understanding and analyzing Van Buskirk's improvement but also for minimizing the memory-access costs of the FFT

    Type-II/III DCT/DST algorithms with reduced number of arithmetic operations

    Full text link
    We present algorithms for the discrete cosine transform (DCT) and discrete sine transform (DST), of types II and III, that achieve a lower count of real multiplications and additions than previously published algorithms, without sacrificing numerical accuracy. Asymptotically, the operation count is reduced from ~ 2N log_2 N to ~ (17/9) N log_2 N for a power-of-two transform size N. Furthermore, we show that a further N multiplications may be saved by a certain rescaling of the inputs or outputs, generalizing a well-known technique for N=8 by Arai et al. These results are derived by considering the DCT to be a special case of a DFT of length 4N, with certain symmetries, and then pruning redundant operations from a recent improved fast Fourier transform algorithm (based on a recursive rescaling of the conjugate-pair split radix algorithm). The improved algorithms for DCT-III, DST-II, and DST-III follow immediately from the improved count for the DCT-II.Comment: 9 page

    Type-IV DCT, DST, and MDCT algorithms with reduced numbers of arithmetic operations

    Full text link
    We present algorithms for the type-IV discrete cosine transform (DCT-IV) and discrete sine transform (DST-IV), as well as for the modified discrete cosine transform (MDCT) and its inverse, that achieve a lower count of real multiplications and additions than previously published algorithms, without sacrificing numerical accuracy. Asymptotically, the operation count is reduced from ~2NlogN to ~(17/9)NlogN for a power-of-two transform size N, and the exact count is strictly lowered for all N > 4. These results are derived by considering the DCT to be a special case of a DFT of length 8N, with certain symmetries, and then pruning redundant operations from a recent improved fast Fourier transform algorithm (based on a recursive rescaling of the conjugate-pair split radix algorithm). The improved algorithms for DST-IV and MDCT follow immediately from the improved count for the DCT-IV.Comment: 11 page

    Generating and Searching Families of FFT Algorithms

    Full text link
    A fundamental question of longstanding theoretical interest is to prove the lowest exact count of real additions and multiplications required to compute a power-of-two discrete Fourier transform (DFT). For 35 years the split-radix algorithm held the record by requiring just 4n log n - 6n + 8 arithmetic operations on real numbers for a size-n DFT, and was widely believed to be the best possible. Recent work by Van Buskirk et al. demonstrated improvements to the split-radix operation count by using multiplier coefficients or "twiddle factors" that are not n-th roots of unity for a size-n DFT. This paper presents a Boolean Satisfiability-based proof of the lowest operation count for certain classes of DFT algorithms. First, we present a novel way to choose new yet valid twiddle factors for the nodes in flowgraphs generated by common power-of-two fast Fourier transform algorithms, FFTs. With this new technique, we can generate a large family of FFTs realizable by a fixed flowgraph. This solution space of FFTs is cast as a Boolean Satisfiability problem, and a modern Satisfiability Modulo Theory solver is applied to search for FFTs requiring the fewest arithmetic operations. Surprisingly, we find that there are FFTs requiring fewer operations than the split-radix even when all twiddle factors are n-th roots of unity.Comment: Preprint submitted on March 28, 2011, to the Journal on Satisfiability, Boolean Modeling and Computatio

    Low Power Implementation of Non Power-of-Two FFTs on Coarse-Grain Reconfigurable Architectures

    Get PDF
    The DRM standard for digital radio broadcast in the AM band requires integrated devices for radio receivers at very low power. A System on Chip (SoC) call DiMITRI was developed based on a dual ARM9 RISC core architecture. Analyses showed that most computation power is used in the Coded Orthogonal Frequency Division Multiplexing (COFDM) demodulation to compute Fast Fourier Transforms (FFT) and inverse transforms (IFFT) on complex samples. These FFTs have to be computed on non power-of-two numbers of samples, which is very uncommon in the signal processing world. The results obtained with this chip, lead to the objective to decrease the power dissipated by the COFDM demodulation part using a coarse-grain reconfigurable structure as a coprocessor. This paper introduces two different coarse-grain architectures: PACT XPP technology and the Montium, developed by the University of Twente, and presents the implementation of a\ud Fast Fourier Transform on 1920 complex samples. The implementation result on the Montium shows a saving of a factor 35 in terms of processing time, and 14 in terms of power consumption compared to the RISC implementation, and a\ud smaller area. Then, as a conclusion, the paper presents the next steps of the development and some development issues

    CONFIGURABLE 2k/4k/8k FFT-IFFT CORE FOR DVB-T AND DVB-H

    Get PDF
    Modulation technique uses a modifier module IFFT signal data from frequency domain to time domain. While at the demodulation part, FFT module is used to change the return signal from the output of the IFFT and converted them from the time domain into the frequency domain. FFT�IFFT modules are made to support 2k/4k/8k FFT and IFFT algorithms. FFT�IFFT 2k/4k/8k Core are built using the radix 2, radix 4 and radix 8. Core is designed to be able to receive data continuously, without buffer (temporary data container). The FFT�IFFT 2k/4k/8k module designs started with the functional description in model. Then the design of hardware architecture is made based on functional design in model. Then the architecture design will be used in making model bit precision. Furthermore the model bit precision design is used as a foundation in designing RTL. The result of FFT�IFFT modules meet the standard specified by the DVB consortium, with a maximum test frequency of FFT�IFFT 2k/4k/8k Core is 69.36 MHz using FPGA STRATIX II EP2S60-F1020C3 that surpass the requirements in the standard DVB�T/DVB�H (40 MHz). In addition, the module has a high throughput with the average of 39.82 M sym /

    Radix-2 x 2 x 2 algorithm for the 3-D discrete hartley transform

    Get PDF
    The discrete Hartley transform (DHT) has proved to be a valuable tool in digital signal/image processing and communications and has also attracted research interests in many multidimensional applications. Although many fast algorithms have been developed for the calculation of one- and two-dimensional (1-D and 2-D) DHT, the development of multidimensional algorithms in three and more dimensions is still unexplored and has not been given similar attention; hence, the multidimensional Hartley transform is usually calculated through the row-column approach. However, proper multidimensional algorithms can be more efficient than the row-column method and need to be developed. Therefore, it is the aim of this paper to introduce the concept and derivation of the three-dimensional (3-D) radix-2 2X 2X algorithm for fast calculation of the 3-D discrete Hartley transform. The proposed algorithm is based on the principles of the divide-and-conquer approach applied directly in 3-D. It has a simple butterfly structure and has been found to offer significant savings in arithmetic operations compared with the row-column approach based on similar algorithms
    corecore