2,600 research outputs found

    Type-II/III DCT/DST algorithms with reduced number of arithmetic operations

    Full text link
    We present algorithms for the discrete cosine transform (DCT) and discrete sine transform (DST), of types II and III, that achieve a lower count of real multiplications and additions than previously published algorithms, without sacrificing numerical accuracy. Asymptotically, the operation count is reduced from ~ 2N log_2 N to ~ (17/9) N log_2 N for a power-of-two transform size N. Furthermore, we show that a further N multiplications may be saved by a certain rescaling of the inputs or outputs, generalizing a well-known technique for N=8 by Arai et al. These results are derived by considering the DCT to be a special case of a DFT of length 4N, with certain symmetries, and then pruning redundant operations from a recent improved fast Fourier transform algorithm (based on a recursive rescaling of the conjugate-pair split radix algorithm). The improved algorithms for DCT-III, DST-II, and DST-III follow immediately from the improved count for the DCT-II.Comment: 9 page

    Type-IV DCT, DST, and MDCT algorithms with reduced numbers of arithmetic operations

    Full text link
    We present algorithms for the type-IV discrete cosine transform (DCT-IV) and discrete sine transform (DST-IV), as well as for the modified discrete cosine transform (MDCT) and its inverse, that achieve a lower count of real multiplications and additions than previously published algorithms, without sacrificing numerical accuracy. Asymptotically, the operation count is reduced from ~2NlogN to ~(17/9)NlogN for a power-of-two transform size N, and the exact count is strictly lowered for all N > 4. These results are derived by considering the DCT to be a special case of a DFT of length 8N, with certain symmetries, and then pruning redundant operations from a recent improved fast Fourier transform algorithm (based on a recursive rescaling of the conjugate-pair split radix algorithm). The improved algorithms for DST-IV and MDCT follow immediately from the improved count for the DCT-IV.Comment: 11 page

    Algebraic Signal Processing Theory: Cooley-Tukey Type Algorithms for DCTs and DSTs

    Full text link
    This paper presents a systematic methodology based on the algebraic theory of signal processing to classify and derive fast algorithms for linear transforms. Instead of manipulating the entries of transform matrices, our approach derives the algorithms by stepwise decomposition of the associated signal models, or polynomial algebras. This decomposition is based on two generic methods or algebraic principles that generalize the well-known Cooley-Tukey FFT and make the algorithms' derivations concise and transparent. Application to the 16 discrete cosine and sine transforms yields a large class of fast algorithms, many of which have not been found before.Comment: 31 pages, more information at http://www.ece.cmu.edu/~smar

    On algebras related to the discrete cosine transform

    Get PDF
    AbstractAn algebraic theory for the discrete cosine transform (DCT) is developed, which is analogous to the well-known theory of the discrete Fourier transform (DFT). Whereas the latter diagonalizes a convolution algebra, which is a polynomial algebra modulo a product of various cyclotomic polynomials, the former diagonalizes a polynomial algebra modulo a product of various polynomials related to the Chebyshev types. When the dimension of the algebra is a power of 2, the DCT diagonalizes a polynomial algebra modulo a product of Chebyshev polynomials of the first type. In both DFT and DCT cases, the Chinese remainder theorem plays a key role in the design of fast algorithms

    A class of AM-QFT algorithms for power-of-two FFT

    Full text link
    This paper proposes a class of power-of-two FFT (Fast Fourier Transform) algorithms, called AM-QFT algorithms, that contains the improved QFT (Quick Fourier Transform), an algorithm recently published, as a special case. The main idea is to apply the Amplitude Modulation Double Sideband - Suppressed Carrier (AM DSB-SC) to convert odd-indices signals into even-indices signals, and to insert this elaboration into the improved QFT algorithm, substituting the multiplication by secant function. The 8 variants of this class are obtained by re-elaboration of the AM DSB-SC idea, and by means of duality. As a result the 8 variants have both the same computational cost and the same memory requirements than improved QFT. Differently, comparing this class of 8 variants of AM-QFT algorithm with the split-radix 3add/3mul (one of the most performing FFT approach appeared in the literature), we obtain the same number of additions and multiplications, but employing half of the trigonometric constants. This makes the proposed FFT algorithms interesting and useful for fixed-point implementations. Some of these variants show advantages versus the improved QFT. In fact one of this variant slightly enhances the numerical accuracy of improved QFT, while other four variants use trigonometric constants that are faster to compute in `on the fly' implementations

    Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations

    Get PDF
    Fast linear transforms are ubiquitous in machine learning, including the discrete Fourier transform, discrete cosine transform, and other structured transformations such as convolutions. All of these transforms can be represented by dense matrix-vector multiplication, yet each has a specialized and highly efficient (subquadratic) algorithm. We ask to what extent hand-crafting these algorithms and implementations is necessary, what structural priors they encode, and how much knowledge is required to automatically learn a fast algorithm for a provided structured transform. Motivated by a characterization of fast matrix-vector multiplication as products of sparse matrices, we introduce a parameterization of divide-and-conquer methods that is capable of representing a large class of transforms. This generic formulation can automatically learn an efficient algorithm for many important transforms; for example, it recovers the O(NlogN)O(N \log N) Cooley-Tukey FFT algorithm to machine precision, for dimensions NN up to 10241024. Furthermore, our method can be incorporated as a lightweight replacement of generic matrices in machine learning pipelines to learn efficient and compressible transformations. On a standard task of compressing a single hidden-layer network, our method exceeds the classification accuracy of unconstrained matrices on CIFAR-10 by 3.9 points---the first time a structured approach has done so---with 4X faster inference speed and 40X fewer parameters

    Discrete cosine transform-only and discrete sine transform-only windowed update algorithms for shifting data with hardware implementation

    Get PDF
    Discrete Cosine Transform (DCT) and Discrete Sine Transform (DST) are widely used in image and data compression applications. To process the DCT or DST of a signal a portion of length N is extracted by windowing. By shifting the window point by point the entire signal can be processed. The algorithms are developed that are capable of updating the DCT and DST independently to reflect the modified window contents i.e. for calculating the DCT of the shifted sequence no DST coefficients are used and similarly for calculating the DST of the shifted sequence no DCT coefficients are used. These algorithms constitute an improvement over previous DCT/DST update algorithms as it establishes independence between the DCT and the DST. The update algorithms used to calculate the transform of the shifted sequence uses less computation as compared to directly evaluating the modified transform via standard fast transform algorithms. Firstly, the r-point, 1 = r = N-1, update algorithms are derived in the presence of the rectangular window. Thereafter, one point independent windowed update in the presence of split-triangular, Hanning, Hamming and Blackman windows are developed. The algorithms were implemented in C language to test their correctness. Thereafter the hardware circuits capable of computing the independent update of DCT-II for the rectangular window of size N=8 and step size of 1 and 4 are developed. The windowed update algorithms are derived for DCT and DST type-I through IV, however the hardware implementation of type-II is given as it is the most frequently used transform
    corecore