Search CORE

1,571 research outputs found

The tangent FFT

Author: Bernstein D.J.
Publication venue: Springer
Publication date: 01/01/2007
Field of study

The split-radix FFT computes a size-n complex DFT, when n is a large power of 2, using just arithmetic operations on real numbers. This operation count was first announced in 1968, stood unchallenged for more than thirty years, and was widely believed to be best possible. Recently James Van Buskirk posted software demonstrating that the split-radix FFT is not optimal. Van Buskirk’s software computes a size-n complex DFT using only arithmetic operations on real numbers. There are now three papers attempting to explain the improvement from 4 to 34/9: Johnson and Frigo, IEEE Transactions on Signal Processing, 2007; Lundy and Van Buskirk, Computing, 2007; and this paper. This paper presents the "tangent FFT," a straightforward in-place cache-friendly DFT algorithm having exactly the same operation counts as Van Buskirk’s algorithm. This paper expresses the tangent FFT as a sequence of standard polynomial operations, and pinpoints how the tangent FFT saves time compared to the split-radix FFT. This description is helpful not only for understanding and analyzing Van Buskirk’s improvement but also for minimizing the memory-access costs of the FFT

CiteSeerX

Repository TU/e

Pure OAI Repository

The tangent fft

Author: D J Bernstein
Daniel J Bernstein
Publication venue
Publication date: 01/01/2007
Field of study

Abstract. The split-radix FFT computes a size-n complex DFT, when n is a large power of 2, using just 4n lg n−6n+8 arithmetic operations on real numbers. This operation count was first announced in 1968, stood unchallenged for more than thirty years, and was widely believed to be best possible. Recently James Van Buskirk posted software demonstrating that the split-radix FFT is not optimal. Van Buskirk's software computes a sizen complex DFT using only (34/9 + o(1))n lg n arithmetic operations on real numbers. There are now three papers attempting to explain the improvement from 4 to 34/9: Johnson and Frigo, IEEE Transactions on Signal Processing, 2007; Lundy and Van Buskirk, Computing, 2007; and this paper. This paper presents the "tangent FFT," a straightforward in-place cache-friendly DFT algorithm having exactly the same operation counts as Van Buskirk's algorithm. This paper expresses the tangent FFT as a sequence of standard polynomial operations, and pinpoints how the tangent FFT saves time compared to the split-radix FFT. This description is helpful not only for understanding and analyzing Van Buskirk's improvement but also for minimizing the memory-access costs of the FFT

CiteSeerX

Type-II/III DCT/DST algorithms with reduced number of arithmetic operations

Author: Ahmed
Arai
Arguello
Astola
Chan
Chen
Crochiere
Duhamel
Duhamel
Duhamel
Feig
Frigo
Frigo
Gentleman
Gopinath
Guo
Haralick
Hou
Johnson
Kamar
Kok
Krot
Lee
Lee
Li
Lundy
Makhoul
Malvar
Martens
Narasimha
Pennebaker
Plonka
Press
Püschel
Qian
Rao
Schatzman
Sorensen
Steidl
Steven G. Johnson
Suehiro
Swarztrauber
Takala
Tasche
Tseng
van Loan
Vetterli
Wang
Wang
Xuancheng Shao
Yaroslavskii
Yavne
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

We present algorithms for the discrete cosine transform (DCT) and discrete sine transform (DST), of types II and III, that achieve a lower count of real multiplications and additions than previously published algorithms, without sacrificing numerical accuracy. Asymptotically, the operation count is reduced from ~ 2N log_2 N to ~ (17/9) N log_2 N for a power-of-two transform size N. Furthermore, we show that a further N multiplications may be saved by a certain rescaling of the inputs or outputs, generalizing a well-known technique for N=8 by Arai et al. These results are derived by considering the DCT to be a special case of a DFT of length 4N, with certain symmetries, and then pruning redundant operations from a recent improved fast Fourier transform algorithm (based on a recursive rescaling of the conjugate-pair split radix algorithm). The improved algorithms for DCT-III, DST-II, and DST-III follow immediately from the improved count for the DCT-II.Comment: 9 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Type-IV DCT, DST, and MDCT algorithms with reduced numbers of arithmetic operations

Author: Arai
Arguello
Britanak
Britanak
Britanak
Chan
Chan
Chen
Cheng
Chiang
Crochiere
Duhamel
Duhamel
Duhamel
Fan
Frigo
Gentleman
Gopinath
Hou
Jing
Johnson
Johnson
Kamar
Kok
Krot
Lee
Lee
Lee
Liu
Lundy
Malvar
Malvar
Malvar
Martens
Murthy
Narasimha
Nikolajevic
Painter
Pennebaker
Plonka
Princen
Püschel
Qian
Schatzman
Steven G. Johnson
Suehiro
Takala
Tasche
Vetterli
Wang
Wang
Wang
Xuancheng Shao
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

We present algorithms for the type-IV discrete cosine transform (DCT-IV) and discrete sine transform (DST-IV), as well as for the modified discrete cosine transform (MDCT) and its inverse, that achieve a lower count of real multiplications and additions than previously published algorithms, without sacrificing numerical accuracy. Asymptotically, the operation count is reduced from ~2NlogN to ~(17/9)NlogN for a power-of-two transform size N, and the exact count is strictly lowered for all N > 4. These results are derived by considering the DCT to be a special case of a DFT of length 8N, with certain symmetries, and then pruning redundant operations from a recent improved fast Fourier transform algorithm (based on a recursive rescaling of the conjugate-pair split radix algorithm). The improved algorithms for DST-IV and MDCT follow immediately from the improved count for the DCT-IV.Comment: 11 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Generating and Searching Families of FFT Algorithms

Author: Haynal Heidi
Haynal Steve
Publication venue
Publication date: 01/01/2011
Field of study

A fundamental question of longstanding theoretical interest is to prove the lowest exact count of real additions and multiplications required to compute a power-of-two discrete Fourier transform (DFT). For 35 years the split-radix algorithm held the record by requiring just 4n log n - 6n + 8 arithmetic operations on real numbers for a size-n DFT, and was widely believed to be the best possible. Recent work by Van Buskirk et al. demonstrated improvements to the split-radix operation count by using multiplier coefficients or "twiddle factors" that are not n-th roots of unity for a size-n DFT. This paper presents a Boolean Satisfiability-based proof of the lowest operation count for certain classes of DFT algorithms. First, we present a novel way to choose new yet valid twiddle factors for the nodes in flowgraphs generated by common power-of-two fast Fourier transform algorithms, FFTs. With this new technique, we can generate a large family of FFTs realizable by a fixed flowgraph. This solution space of FFTs is cast as a Boolean Satisfiability problem, and a modern Satisfiability Modulo Theory solver is applied to search for FFTs requiring the fewest arithmetic operations. Surprisingly, we find that there are FFTs requiring fewer operations than the split-radix even when all twiddle factors are n-th roots of unity.Comment: Preprint submitted on March 28, 2011, to the Journal on Satisfiability, Boolean Modeling and Computatio

arXiv.org e-Print Archive

CiteSeerX

Low Power Implementation of Non Power-of-Two FFTs on Coarse-Grain Reconfigurable Architectures

Author: Quevremont Jérôme
Rivaton Arnaud
Smit Gerard
Wolkotte Pascal
Zhang Qiwei
Publication venue: Technology Foundation, STW
Publication date: 01/01/2005
Field of study

The DRM standard for digital radio broadcast in the AM band requires integrated devices for radio receivers at very low power. A System on Chip (SoC) call DiMITRI was developed based on a dual ARM9 RISC core architecture. Analyses showed that most computation power is used in the Coded Orthogonal Frequency Division Multiplexing (COFDM) demodulation to compute Fast Fourier Transforms (FFT) and inverse transforms (IFFT) on complex samples. These FFTs have to be computed on non power-of-two numbers of samples, which is very uncommon in the signal processing world. The results obtained with this chip, lead to the objective to decrease the power dissipated by the COFDM demodulation part using a coarse-grain reconfigurable structure as a coprocessor. This paper introduces two different coarse-grain architectures: PACT XPP technology and the Montium, developed by the University of Twente, and presents the implementation of a\ud Fast Fourier Transform on 1920 complex samples. The implementation result on the Montium shows a saving of a factor 35 in terms of processing time, and 14 in terms of power consumption compared to the RISC implementation, and a\ud smaller area. Then, as a conclusion, the paper presents the next steps of the development and some development issues

University of Twente Research Information

CONFIGURABLE 2k/4k/8k FFT-IFFT CORE FOR DVB-T AND DVB-H

Author: Andis Andys
Publication venue
Publication date: 01/01/2010
Field of study

Modulation technique uses a modifier module IFFT signal data from frequency domain to time domain. While at the demodulation part, FFT module is used to change the return signal from the output of the IFFT and converted them from the time domain into the frequency domain. FFTâ€�IFFT modules are made to support 2k/4k/8k FFT and IFFT algorithms. FFTâ€�IFFT 2k/4k/8k Core are built using the radix 2, radix 4 and radix 8. Core is designed to be able to receive data continuously, without buffer (temporary data container). The FFTâ€�IFFT 2k/4k/8k module designs started with the functional description in model. Then the design of hardware architecture is made based on functional design in model. Then the architecture design will be used in making model bit precision. Furthermore the model bit precision design is used as a foundation in designing RTL. The result of FFTâ€�IFFT modules meet the standard specified by the DVB consortium, with a maximum test frequency of FFTâ€�IFFT 2k/4k/8k Core is 69.36 MHz using FPGA STRATIX II EP2S60-F1020C3 that surpass the requirements in the standard DVBâ€�T/DVBâ€�H (40 MHz). In addition, the module has a high throughput with the average of 39.82 M sym /

EEPIS Repository

Radix-2 x 2 x 2 algorithm for the 3-D discrete hartley transform

Author: Alshibami O.H.
Aziz M.Y.
Boussakta S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2001
Field of study

The discrete Hartley transform (DHT) has proved to be a valuable tool in digital signal/image processing and communications and has also attracted research interests in many multidimensional applications. Although many fast algorithms have been developed for the calculation of one- and two-dimensional (1-D and 2-D) DHT, the development of multidimensional algorithms in three and more dimensions is still unexplored and has not been given similar attention; hence, the multidimensional Hartley transform is usually calculated through the row-column approach. However, proper multidimensional algorithms can be more efficient than the row-column method and need to be developed. Therefore, it is the aim of this paper to introduce the concept and derivation of the three-dimensional (3-D) radix-2 2X 2X algorithm for fast calculation of the 3-D discrete Hartley transform. The proposed algorithm is based on the principles of the divide-and-conquer approach applied directly in 3-D. It has a simple butterfly structure and has been found to offer significant savings in arithmetic operations compared with the row-column approach based on similar algorithms

White Rose Research Online