2,600 research outputs found
Type-II/III DCT/DST algorithms with reduced number of arithmetic operations
We present algorithms for the discrete cosine transform (DCT) and discrete
sine transform (DST), of types II and III, that achieve a lower count of real
multiplications and additions than previously published algorithms, without
sacrificing numerical accuracy. Asymptotically, the operation count is reduced
from ~ 2N log_2 N to ~ (17/9) N log_2 N for a power-of-two transform size N.
Furthermore, we show that a further N multiplications may be saved by a certain
rescaling of the inputs or outputs, generalizing a well-known technique for N=8
by Arai et al. These results are derived by considering the DCT to be a special
case of a DFT of length 4N, with certain symmetries, and then pruning redundant
operations from a recent improved fast Fourier transform algorithm (based on a
recursive rescaling of the conjugate-pair split radix algorithm). The improved
algorithms for DCT-III, DST-II, and DST-III follow immediately from the
improved count for the DCT-II.Comment: 9 page
Type-IV DCT, DST, and MDCT algorithms with reduced numbers of arithmetic operations
We present algorithms for the type-IV discrete cosine transform (DCT-IV) and
discrete sine transform (DST-IV), as well as for the modified discrete cosine
transform (MDCT) and its inverse, that achieve a lower count of real
multiplications and additions than previously published algorithms, without
sacrificing numerical accuracy. Asymptotically, the operation count is reduced
from ~2NlogN to ~(17/9)NlogN for a power-of-two transform size N, and the exact
count is strictly lowered for all N > 4. These results are derived by
considering the DCT to be a special case of a DFT of length 8N, with certain
symmetries, and then pruning redundant operations from a recent improved fast
Fourier transform algorithm (based on a recursive rescaling of the
conjugate-pair split radix algorithm). The improved algorithms for DST-IV and
MDCT follow immediately from the improved count for the DCT-IV.Comment: 11 page
Algebraic Signal Processing Theory: Cooley-Tukey Type Algorithms for DCTs and DSTs
This paper presents a systematic methodology based on the algebraic theory of
signal processing to classify and derive fast algorithms for linear transforms.
Instead of manipulating the entries of transform matrices, our approach derives
the algorithms by stepwise decomposition of the associated signal models, or
polynomial algebras. This decomposition is based on two generic methods or
algebraic principles that generalize the well-known Cooley-Tukey FFT and make
the algorithms' derivations concise and transparent. Application to the 16
discrete cosine and sine transforms yields a large class of fast algorithms,
many of which have not been found before.Comment: 31 pages, more information at http://www.ece.cmu.edu/~smar
On algebras related to the discrete cosine transform
AbstractAn algebraic theory for the discrete cosine transform (DCT) is developed, which is analogous to the well-known theory of the discrete Fourier transform (DFT). Whereas the latter diagonalizes a convolution algebra, which is a polynomial algebra modulo a product of various cyclotomic polynomials, the former diagonalizes a polynomial algebra modulo a product of various polynomials related to the Chebyshev types. When the dimension of the algebra is a power of 2, the DCT diagonalizes a polynomial algebra modulo a product of Chebyshev polynomials of the first type. In both DFT and DCT cases, the Chinese remainder theorem plays a key role in the design of fast algorithms
A class of AM-QFT algorithms for power-of-two FFT
This paper proposes a class of power-of-two FFT (Fast Fourier Transform)
algorithms, called AM-QFT algorithms, that contains the improved QFT (Quick
Fourier Transform), an algorithm recently published, as a special case. The
main idea is to apply the Amplitude Modulation Double Sideband - Suppressed
Carrier (AM DSB-SC) to convert odd-indices signals into even-indices signals,
and to insert this elaboration into the improved QFT algorithm, substituting
the multiplication by secant function. The 8 variants of this class are
obtained by re-elaboration of the AM DSB-SC idea, and by means of duality. As a
result the 8 variants have both the same computational cost and the same memory
requirements than improved QFT. Differently, comparing this class of 8 variants
of AM-QFT algorithm with the split-radix 3add/3mul (one of the most performing
FFT approach appeared in the literature), we obtain the same number of
additions and multiplications, but employing half of the trigonometric
constants. This makes the proposed FFT algorithms interesting and useful for
fixed-point implementations. Some of these variants show advantages versus the
improved QFT. In fact one of this variant slightly enhances the numerical
accuracy of improved QFT, while other four variants use trigonometric constants
that are faster to compute in `on the fly' implementations
Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations
Fast linear transforms are ubiquitous in machine learning, including the
discrete Fourier transform, discrete cosine transform, and other structured
transformations such as convolutions. All of these transforms can be
represented by dense matrix-vector multiplication, yet each has a specialized
and highly efficient (subquadratic) algorithm. We ask to what extent
hand-crafting these algorithms and implementations is necessary, what
structural priors they encode, and how much knowledge is required to
automatically learn a fast algorithm for a provided structured transform.
Motivated by a characterization of fast matrix-vector multiplication as
products of sparse matrices, we introduce a parameterization of
divide-and-conquer methods that is capable of representing a large class of
transforms. This generic formulation can automatically learn an efficient
algorithm for many important transforms; for example, it recovers the Cooley-Tukey FFT algorithm to machine precision, for dimensions up to
. Furthermore, our method can be incorporated as a lightweight
replacement of generic matrices in machine learning pipelines to learn
efficient and compressible transformations. On a standard task of compressing a
single hidden-layer network, our method exceeds the classification accuracy of
unconstrained matrices on CIFAR-10 by 3.9 points---the first time a structured
approach has done so---with 4X faster inference speed and 40X fewer parameters
Discrete cosine transform-only and discrete sine transform-only windowed update algorithms for shifting data with hardware implementation
Discrete Cosine Transform (DCT) and Discrete Sine Transform (DST) are widely used in image and data compression applications. To process the DCT or DST of a signal a portion of length N is extracted by windowing. By shifting the window point by point the entire signal can be processed. The algorithms are developed that are capable of updating the DCT and DST independently to reflect the modified window contents i.e. for calculating the DCT of the shifted sequence no DST coefficients are used and similarly for calculating the DST of the shifted sequence no DCT coefficients are used. These algorithms constitute an improvement over previous DCT/DST update algorithms as it establishes independence between the DCT and the DST. The update algorithms used to calculate the transform of the shifted sequence uses less computation as compared to directly evaluating the modified transform via standard fast transform algorithms. Firstly, the r-point, 1 = r = N-1, update algorithms are derived in the presence of the rectangular window. Thereafter, one point independent windowed update in the presence of split-triangular, Hanning, Hamming and Blackman windows are developed. The algorithms were implemented in C language to test their correctness. Thereafter the hardware circuits capable of computing the independent update of DCT-II for the rectangular window of size N=8 and step size of 1 and 4 are developed. The windowed update algorithms are derived for DCT and DST type-I through IV, however the hardware implementation of type-II is given as it is the most frequently used transform
- …