49 research outputs found
Efficiency in audio processing : filter banks and transcoding
Audio transcoding is the conversion of digital audio from one compressed form A to another compressed form B, where A and B have different compression properties, such as a different bit-rate, sampling frequency or compression method. This is typically achieved by decoding A to an intermediate uncompressed form, and then encoding it to B. A significant portion of the involved computational effort pertains to operating the synthesis filter bank, which is an important processing block in the decoding stage, and the analysis filter bank, which is an important processing block in the encoding stage. This thesis presents methods for efficient implementations of filter banks and audio transcoders, and is separated into two main parts. In the first part, a new class of Frequency Response Masking (FRM) filter banks is introduced. These filter banks are usually characterized by comprising a tree-structured cascade of subfilters, which have small individual filter lengths. Methods of complexity reduction are proposed for the scenarios when the filter banks are operated in single-rate mode, and when they are operated in multirate mode; and for the scenarios when the input signal is real-valued, and when it is complex-valued. An efficient variable bandwidth FRM filter bank is designed by using signed-powers-of-two reduction of its subfilter coefficients. Our design has a complexity an order lower than that of an octave filter bank with the same specifications. In the second part, the audio transcoding process is analyzed. Audio transcoding is modeled as a cascaded quantization process, and the cascaded quantization of an input signal is analyzed under different conditions, for the MPEG 1 Layer 2 and MP3 compression methods. One condition is the input-to-output delay of the transcoder, which is known to have an impact on the audio quality of the transcoded material. Methods to reduce the error in a cascaded quantization process are also proposed. An ultra-fast MP3 transcoder that requires only integer operations is proposed and implemented in software. Our implementation shows an improvement by a factor of 5 to 16 over other best known transcoders in terms of execution speed
Multi-image classification and compression using vector quantization
Vector Quantization (VQ) is an image processing technique based on statistical clustering, and designed originally for image compression. In this dissertation, several methods for multi-image classification and compression based on a VQ design are presented. It is demonstrated that VQ can perform joint multi-image classification and compression by associating a class identifier with each multi-spectral signature codevector. We extend the Weighted Bayes Risk VQ (WBRVQ) method, previously used for single-component images, that explicitly incorporates a Bayes risk component into the distortion measure used in the VQ quantizer design and thereby permits a flexible trade-off between classification and compression priorities. In the specific case of multi-spectral images, we investigate the application of the Multi-scale Retinex algorithm as a preprocessing stage, before classification and compression, that performs dynamic range compression, reduces the dependence on lighting conditions, and generally enhances apparent spatial resolution. The goals of this research are four-fold: (1) to study the interrelationship between statistical clustering, classification and compression in a multi-image VQ context; (2) to study mixed-pixel classification and combined classification and compression for simulated and actual, multispectral and hyperspectral multi-images; (3) to study the effects of multi-image enhancement on class spectral signatures; and (4) to study the preservation of scientific data integrity as a function of compression. In this research, a key issue is not just the subjective quality of the resulting images after classification and compression but also the effect of multi-image dimensionality on the complexity of the optimal coder design
Efficient compression of motion compensated residuals
EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Tree-Structured Nonlinear Adaptive Signal Processing
In communication systems, nonlinear adaptive filtering has become increasingly popular in a variety of applications such as channel equalization, echo cancellation and speech coding. However, existing nonlinear adaptive filters such as polynomial (truncated Volterra series) filters and multilayer perceptrons suffer from a number of problems. First, although high Order polynomials can approximate complex nonlinearities, they also train very slowly. Second, there is no systematic and efficient way to select their structure. As for multilayer perceptrons, they have a very complicated structure and train extremely slowly Motivated by the success of classification and regression trees on difficult nonlinear and nonparametfic problems, we propose the idea of a tree-structured piecewise linear adaptive filter. In the proposed method each node in a tree is associated with a linear filter restricted to a polygonal domain, and this is done in such a way that each pruned subtree is associated with a piecewise linear filter. A training sequence is used to adaptively update the filter coefficients and domains at each node, and to select the best pruned subtree and the corresponding piecewise linear filter. The tree structured approach offers several advantages. First, it makes use of standard linear adaptive filtering techniques at each node to find the corresponding Conditional linear filter. Second, it allows for efficient selection of the subtree and the corresponding piecewise linear filter of appropriate complexity. Overall, the approach is computationally efficient and conceptually simple. The tree-structured piecewise linear adaptive filter bears some similarity to classification and regression trees. But it is actually quite different from a classification and regression tree. Here the terminal nodes are not just assigned a region and a class label or a regression value, but rather represent: a linear filter with restricted domain, It is also different in that classification and regression trees are determined in a batch mode offline, whereas the tree-structured adaptive filter is determined recursively in real-time. We first develop the specific structure of a tree-structured piecewise linear adaptive filter and derive a stochastic gradient-based training algorithm. We then carry out a rigorous convergence analysis of the proposed training algorithm for the tree-structured filter. Here we show the mean-square convergence of the adaptively trained tree-structured piecewise linear filter to the optimal tree-structured piecewise linear filter. Same new techniques are developed for analyzing stochastic gradient algorithms with fixed gains and (nonstandard) dependent data. Finally, numerical experiments are performed to show the computational and performance advantages of the tree-structured piecewise linear filter over linear and polynomial filters for equalization of high frequency channels with severe intersymbol interference, echo cancellation in telephone networks and predictive coding of speech signals
Risk Bounds for CART Classifiers under a Margin Condition
Risk bounds for Classification and Regression Trees (CART, Breiman et. al.
1984) classifiers are obtained under a margin condition in the binary
supervised classification framework. These risk bounds are obtained
conditionally on the construction of the maximal deep binary tree and permit to
prove that the linear penalty used in the CART pruning algorithm is valid under
a margin condition. It is also shown that, conditionally on the construction of
the maximal tree, the final selection by test sample does not alter
dramatically the estimation accuracy of the Bayes classifier. In the two-class
classification framework, the risk bounds that are proved, obtained by using
penalized model selection, validate the CART algorithm which is used in many
data mining applications such as Biology, Medicine or Image Coding
Progressive Source-Channel Coding for Multimedia Transmission over Noisy and Lossy Channels with and without Feedback
Rate-scalable or layered lossy source-coding is useful for
progressive transmission of multimedia sources, where the receiver can
reconstruct the source incrementally.
This thesis considers ``joint source-channel'' schemes
for such a progressive transmission, in the presence of
noise or loss, with and without the use of a feedback link.
First we design image communication schemes for memoryless and finite
state channels using limited and explicitly constrained use of
the feedback channel in the form of a variable incremental redundancy
Hybrid ARQ protocol. Constraining feedback allows a direct
comparison with schemes without feedback. Optimized feedback based
systems are shown to have useful gains.
Second, we develop a controlled Markov chain approach for constrained feedback Hybrid ARQ protocol design.
The proposed methodology allows the protocol to be chosen from a collection of signal flow graphs, and
also allows explicit control over the tradeoffs in throughput, reliability and complexity.
Next we consider progressive image transmission in
the absence of feedback. We assign unequal error protection to the bits of
a rate-scalable source-coder using rate compatible
channel codes. We show that, under the framework, the source and
channel bits can be ``scheduled'' in a single bitstream in such a way
that operational optimality is retained for different transmission
budgets, creating a rate-scalable joint source-channel coder.
Next we undertake the design of a joint source-channel decoder that
uses ``distortion aware'' ACK/NACK feedback generation. For
memoryless channels, and Type-I HARQ, the design of optimal ACK/NACK
generation and decoding by packet combining is cast and solved as a
sequential decision problem. We obtain dynamic programming based
optimal solutions and also propose suboptimal, lower complexity
distortion-aware decoders and feedback generation rules which
outperform conventional BER based rules such as
CRC-check.
Finally we design operational rate-distortion optimal ACK/NACK
feedback generation rules for transmitting a tree structured quantizer
over a memoryless channel. We show that the optimal feedback
generation rules are embedded, that is, they allow incremental
switching to higher rates during the transmission. Also, we
obtain the structure of the feedback generation rules in terms
of a feedback threshold function that simplifies the implementation
A Parametric Approach for Efficient Speech Storage, Flexible Synthesis and Voice Conversion
During the past decades, many areas of speech processing have benefited from the vast increases in the available memory sizes and processing power. For example, speech recognizers can be trained with enormous speech databases and high-quality speech synthesizers can generate new speech sentences by concatenating speech units retrieved from a large inventory of speech data. However, even in today's world of ever-increasing memory sizes and computational resources, there are still lots of embedded application scenarios for speech processing techniques where the memory capacities and the processor speeds are very limited. Thus, there is still a clear demand for solutions that can operate with limited resources, e.g., on low-end mobile devices.
This thesis introduces a new segmental parametric speech codec referred to as the VLBR codec. The novel proprietary sinusoidal speech codec designed for efficient speech storage is capable of achieving relatively good speech quality at compression ratios beyond the ones offered by the standardized speech coding solutions, i.e., at bitrates of approximately 1 kbps and below. The efficiency of the proposed coding approach is based on model simplifications, mode-based segmental processing, and the method of adaptive downsampling and quantization. The coding efficiency is also further improved using a novel flexible multi-mode matrix quantizer structure and enhanced dynamic codebook reordering. The compression is also facilitated using a new perceptual irrelevancy removal method.
The VLBR codec is also applied to text-to-speech synthesis. In particular, the codec is utilized for the compression of unit selection databases and for the parametric concatenation of speech units. It is also shown that the efficiency of the database compression can be further enhanced using speaker-specific retraining of the codec. Moreover, the computational load is significantly decreased using a new compression-motivated scheme for very fast and memory-efficient calculation of concatenation costs, based on techniques and implementations used in the VLBR codec.
Finally, the VLBR codec and the related speech synthesis techniques are complemented with voice conversion methods that allow modifying the perceived speaker identity which in turn enables, e.g., cost-efficient creation of new text-to-speech voices. The VLBR-based voice conversion system combines compression with the popular Gaussian mixture model based conversion approach. Furthermore, a novel method is proposed for converting the prosodic aspects of speech. The performance of the VLBR-based voice conversion system is also enhanced using a new approach for mode selection and through explicit control of the degree of voicing.
The solutions proposed in the thesis together form a complete system that can be utilized in different ways and configurations. The VLBR codec itself can be utilized, e.g., for efficient compression of audio books, and the speech synthesis related methods can be used for reducing the footprint and the computational load of concatenative text-to-speech synthesizers to levels required in some embedded applications. The VLBR-based voice conversion techniques can be used to complement the codec both in storage applications and in connection with speech synthesis. It is also possible to only utilize the voice conversion functionality, e.g., in games or other entertainment applications
Conjoint probabilistic subband modeling
Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1997.Includes bibliographical references (leaves 125-133).by Ashok Chhabedia Popat.Ph.D