5 research outputs found
Throughput Scaling Of Convolution For Error-Tolerant Multimedia Applications
Convolution and cross-correlation are the basis of filtering and pattern or
template matching in multimedia signal processing. We propose two throughput
scaling options for any one-dimensional convolution kernel in programmable
processors by adjusting the imprecision (distortion) of computation. Our
approach is based on scalar quantization, followed by two forms of tight
packing in floating-point (one of which is proposed in this paper) that allow
for concurrent calculation of multiple results. We illustrate how our approach
can operate as an optional pre- and post-processing layer for off-the-shelf
optimized convolution routines. This is useful for multimedia applications that
are tolerant to processing imprecision and for cases where the input signals
are inherently noisy (error tolerant multimedia applications). Indicative
experimental results with a digital music matching system and an MPEG-7 audio
descriptor system demonstrate that the proposed approach offers up to 175%
increase in processing throughput against optimized (full-precision)
convolution with virtually no effect in the accuracy of the results. Based on
marginal statistics of the input data, it is also shown how the throughput and
distortion can be adjusted per input block of samples under constraints on the
signal-to-noise ratio against the full-precision convolution.Comment: IEEE Trans. on Multimedia, 201
Quantization, Perception and Speech Coding
This thesis is about compression of speech signals, a research area known as speech coding. The aim of a speech coder is to provide efficient digital representations of speech signals, required in digital transmission and storage systems. Contemporary speech coding algorithms typically separate the speech signal into sets of parameters. Subsequently, efficient representations are achieved by quantization, the compression tool employed in all speech coding algorithms. Efficient quantization can be achieved by employing vector quantization, where blocks of parameters are quantized simultaneously. In this thesis vector quantization of various speech parameters is considered, also taking perceptual aspects into account. Tools such as Gaussian mixture models, companded quantization, lattice quantization, memory quantization, perception, and high-rate theory are utilized in order to find tractable coding procedures. In particular speech spectrum coding, a vital part of most low to medium rate coders, is studied. A spectrum distortion measure that can penalize a perceptually bad time evolution of the spectrum, and new vector quantization procedures for exploitation of interframe memory, are proposed. The proposed techniques are shown to yield high objective and subjective quality. Furthermore, a pdf-optimized coding system based on lattice quantization is proposed. The system is easily adapted in rate, a property desired in emerging networks. A Gaussian mixture model of the source pdf constitutes a means for capturing the statistics of the source, facilitating the design of efficient vector quantizers based on high-rate theory. Actual quantization is based on lattice quantization, allowing a low-complexity codebook search. The system requires no direct training of codebooks, and is therefore easily adapted in rate. The adaptivity also facilitates the use of recursively or perceptually adapted codebooks. The scheme has been applied to the LPC residual and MDCT coefficients, showing competitive performance
Quantization, Perception and Speech Coding
This thesis is about compression of speech signals, a research area known as speech coding. The aim of a speech coder is to provide efficient digital representations of speech signals, required in digital transmission and storage systems. Contemporary speech coding algorithms typically separate the speech signal into sets of parameters. Subsequently, efficient representations are achieved by quantization, the compression tool employed in all speech coding algorithms. Efficient quantization can be achieved by employing vector quantization, where blocks of parameters are quantized simultaneously. In this thesis vector quantization of various speech parameters is considered, also taking perceptual aspects into account. Tools such as Gaussian mixture models, companded quantization, lattice quantization, memory quantization, perception, and high-rate theory are utilized in order to find tractable coding procedures. In particular speech spectrum coding, a vital part of most low to medium rate coders, is studied. A spectrum distortion measure that can penalize a perceptually bad time evolution of the spectrum, and new vector quantization procedures for exploitation of interframe memory, are proposed. The proposed techniques are shown to yield high objective and subjective quality. Furthermore, a pdf-optimized coding system based on lattice quantization is proposed. The system is easily adapted in rate, a property desired in emerging networks. A Gaussian mixture model of the source pdf constitutes a means for capturing the statistics of the source, facilitating the design of efficient vector quantizers based on high-rate theory. Actual quantization is based on lattice quantization, allowing a low-complexity codebook search. The system requires no direct training of codebooks, and is therefore easily adapted in rate. The adaptivity also facilitates the use of recursively or perceptually adapted codebooks. The scheme has been applied to the LPC residual and MDCT coefficients, showing competitive performance