1,064 research outputs found
Fast algorithm for the 3-D DCT-II
Recently, many applications for three-dimensional
(3-D) image and video compression have been proposed using 3-D discrete cosine transforms (3-D DCTs). Among different types of DCTs, the type-II DCT (DCT-II) is the most used. In order to use the 3-D DCTs in practical applications, fast 3-D algorithms are essential. Therefore, in this paper, the 3-D vector-radix decimation-in-frequency (3-D VR DIF) algorithm that calculates the 3-D DCT-II directly is introduced. The mathematical analysis and the implementation of the developed algorithm are presented,
showing that this algorithm possesses a regular structure, can be implemented in-place for efficient use of memory, and is faster than the conventional row-column-frame (RCF) approach. Furthermore, an application of 3-D video compression-based 3-D DCT-II is implemented using the 3-D new algorithm. This has led to a substantial speed improvement for 3-D DCT-II-based compression systems and proved the validity of the developed algorithm
Type-II/III DCT/DST algorithms with reduced number of arithmetic operations
We present algorithms for the discrete cosine transform (DCT) and discrete
sine transform (DST), of types II and III, that achieve a lower count of real
multiplications and additions than previously published algorithms, without
sacrificing numerical accuracy. Asymptotically, the operation count is reduced
from ~ 2N log_2 N to ~ (17/9) N log_2 N for a power-of-two transform size N.
Furthermore, we show that a further N multiplications may be saved by a certain
rescaling of the inputs or outputs, generalizing a well-known technique for N=8
by Arai et al. These results are derived by considering the DCT to be a special
case of a DFT of length 4N, with certain symmetries, and then pruning redundant
operations from a recent improved fast Fourier transform algorithm (based on a
recursive rescaling of the conjugate-pair split radix algorithm). The improved
algorithms for DCT-III, DST-II, and DST-III follow immediately from the
improved count for the DCT-II.Comment: 9 page
Type-IV DCT, DST, and MDCT algorithms with reduced numbers of arithmetic operations
We present algorithms for the type-IV discrete cosine transform (DCT-IV) and
discrete sine transform (DST-IV), as well as for the modified discrete cosine
transform (MDCT) and its inverse, that achieve a lower count of real
multiplications and additions than previously published algorithms, without
sacrificing numerical accuracy. Asymptotically, the operation count is reduced
from ~2NlogN to ~(17/9)NlogN for a power-of-two transform size N, and the exact
count is strictly lowered for all N > 4. These results are derived by
considering the DCT to be a special case of a DFT of length 8N, with certain
symmetries, and then pruning redundant operations from a recent improved fast
Fourier transform algorithm (based on a recursive rescaling of the
conjugate-pair split radix algorithm). The improved algorithms for DST-IV and
MDCT follow immediately from the improved count for the DCT-IV.Comment: 11 page
Algebraic Signal Processing Theory: Cooley-Tukey Type Algorithms for Polynomial Transforms Based on Induction
A polynomial transform is the multiplication of an input vector x\in\C^n by
a matrix \PT_{b,\alpha}\in\C^{n\times n}, whose -th element is
defined as for polynomials p_\ell(x)\in\C[x] from a list
and sample points \alpha_k\in\C from a list
. Such transforms find applications in
the areas of signal processing, data compression, and function interpolation.
Important examples include the discrete Fourier and cosine transforms. In this
paper we introduce a novel technique to derive fast algorithms for polynomial
transforms. The technique uses the relationship between polynomial transforms
and the representation theory of polynomial algebras. Specifically, we derive
algorithms by decomposing the regular modules of these algebras as a stepwise
induction. As an application, we derive novel general-radix
algorithms for the discrete Fourier transform and the discrete cosine transform
of type 4.Comment: 19 pages. Submitted to SIAM Journal on Matrix Analysis and
Application
Overview of Parallel Platforms for Common High Performance Computing
The paper deals with various parallel platforms used for high performance computing in the signal processing domain. More precisely, the methods exploiting the multicores central processing units such as message passing interface and OpenMP are taken into account. The properties of the programming methods are experimentally proved in the application of a fast Fourier transform and a discrete cosine transform and they are compared with the possibilities of MATLAB's built-in functions and Texas Instruments digital signal processors with very long instruction word architectures. New FFT and DCT implementations were proposed and tested. The implementation phase was compared with CPU based computing methods and with possibilities of the Texas Instruments digital signal processing library on C6747 floating-point DSPs. The optimal combination of computing methods in the signal processing domain and new, fast routines' implementation is proposed as well
Signal Flow Graph Approach to Efficient DST I-IV Algorithms
In this paper, fast and efficient discrete sine transformation (DST)
algorithms are presented based on the factorization of sparse, scaled
orthogonal, rotation, rotation-reflection, and butterfly matrices. These
algorithms are completely recursive and solely based on DST I-IV. The presented
algorithms have low arithmetic cost compared to the known fast DST algorithms.
Furthermore, the language of signal flow graph representation of digital
structures is used to describe these efficient and recursive DST algorithms
having points signal flow graph for DST-I and points signal flow
graphs for DST II-IV
Recommended from our members
Two-dimensional DCT/IDCT architecture
A fully parallel architecture for the computation of a two-dimensional (2-D) discrete cosine transform (DCT), based on row-column decomposition is presented. It uses the same one dimensional (1-D) DCT unit for the row and column computations and (N2+N) registers to perform the transposition. It possesses features of regularity and modularity, and is thus well suited for VLSI implementation. It can be used for the computation of either the forward or the inverse 2-D DCT. Each 1-D DCT unit uses N fully parallel vector inner product (VIP) units. The design of the VIP units is based on a systematic design methodology using radix-2â arithmetic, which allows partitioning of the elements of each vector into small groups. Array multipliers without the final adder are used to produce the different partial product terms. This allows a more efficient use of 4:2 compressors for the accumulation of the products in the intermediate stages and reduces the number of accumulators from N to one. Using this procedure, the 2-D DCT architecture requires less than N2 multipliers (in terms of area occupied) and only 2N adders. It can compute a N x N-point DCT at a rate of one complete transform per N cycles after an appropriate initial delay
Algebraic Signal Processing Theory: Cooley-Tukey Type Algorithms for DCTs and DSTs
This paper presents a systematic methodology based on the algebraic theory of
signal processing to classify and derive fast algorithms for linear transforms.
Instead of manipulating the entries of transform matrices, our approach derives
the algorithms by stepwise decomposition of the associated signal models, or
polynomial algebras. This decomposition is based on two generic methods or
algebraic principles that generalize the well-known Cooley-Tukey FFT and make
the algorithms' derivations concise and transparent. Application to the 16
discrete cosine and sine transforms yields a large class of fast algorithms,
many of which have not been found before.Comment: 31 pages, more information at http://www.ece.cmu.edu/~smar
- âŠ