59 research outputs found
Convolutive Blind Source Separation Methods
In this chapter, we provide an overview of existing algorithms for blind source separation of convolutive audio mixtures. We provide a taxonomy, wherein many of the existing algorithms can be organized, and we present published results from those algorithms that have been applied to real-world audio separation tasks
Blind dereverberation of speech from moving and stationary speakers using sequential Monte Carlo methods
Speech signals radiated in confined spaces are subject to reverberation due to reflections
of surrounding walls and obstacles. Reverberation leads to severe degradation
of speech intelligibility and can be prohibitive for applications where speech is digitally
recorded, such as audio conferencing or hearing aids. Dereverberation of speech
is therefore an important field in speech enhancement.
Driven by consumer demand, blind speech dereverberation has become a popular
field in the research community and has led to many interesting approaches in the literature.
However, most existing methods are dictated by their underlying models and
hence suffer from assumptions that constrain the approaches to specific subproblems
of blind speech dereverberation. For example, many approaches limit the dereverberation
to voiced speech sounds, leading to poor results for unvoiced speech. Few
approaches tackle single-sensor blind speech dereverberation, and only a very limited
subset allows for dereverberation of speech from moving speakers.
Therefore, the aim of this dissertation is the development of a flexible and extendible
framework for blind speech dereverberation accommodating different speech
sound types, single- or multiple sensor as well as stationary and moving speakers.
Bayesian methods benefit from – rather than being dictated by – appropriate model
choices. Therefore, the problem of blind speech dereverberation is considered from
a Bayesian perspective in this thesis. A generic sequential Monte Carlo approach
accommodating a multitude of models for the speech production mechanism and
room transfer function is consequently derived. In this approach both the anechoic
source signal and reverberant channel are estimated using their optimal estimators by
means of Rao-Blackwellisation of the state-space of unknown variables. The remaining
model parameters are estimated using sequential importance resampling.
The proposed approach is implemented for two different speech production models
for stationary speakers, demonstrating substantial reduction in reverberation for
both unvoiced and voiced speech sounds. Furthermore, the channel model is extended
to facilitate blind dereverberation of speech from moving speakers. Due to the
structure of measurement model, single- as well as multi-microphone processing is facilitated,
accommodating physically constrained scenarios where only a single sensor
can be used as well as allowing for the exploitation of spatial diversity in scenarios
where the physical size of microphone arrays is of no concern.
This dissertation is concluded with a survey of possible directions for future research,
including the use of switching Markov source models, joint target tracking
and enhancement, as well as an extension to subband processing for improved computational
efficiency
Dynamic texture synthesis in image and video processing.
Xu, Leilei.Thesis submitted in: October 2007.Thesis (M.Phil.)--Chinese University of Hong Kong, 2008.Includes bibliographical references (leaves 78-84).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Texture and Dynamic Textures --- p.1Chapter 1.2 --- Related work --- p.4Chapter 1.3 --- Thesis Outline --- p.7Chapter 2 --- Image/Video Processing --- p.8Chapter 2.1 --- Bayesian Analysis --- p.8Chapter 2.2 --- Markov Property --- p.10Chapter 2.3 --- Graph Cut --- p.12Chapter 2.4 --- Belief Propagation --- p.13Chapter 2.5 --- Expectation-Maximization --- p.15Chapter 2.6 --- Principle Component Analysis --- p.15Chapter 3 --- Linear Dynamic System --- p.17Chapter 3.1 --- System Model --- p.18Chapter 3.2 --- Degeneracy and Canonical Model Realization --- p.19Chapter 3.3 --- Learning of Dynamic Textures --- p.19Chapter 3.4 --- Synthesizing Dynamic Textures --- p.21Chapter 3.5 --- Summary --- p.21Chapter 4 --- Dynamic Color Texture Synthesis --- p.25Chapter 4.1 --- Related Work --- p.25Chapter 4.2 --- System Model --- p.26Chapter 4.2.1 --- Laplacian Pyramid-based DCTS Model --- p.28Chapter 4.2.2 --- RBF-based DCTS Model --- p.28Chapter 4.3 --- Experimental Results --- p.32Chapter 4.4 --- Summary --- p.42Chapter 5 --- Dynamic Textures using Multi-resolution Analysis --- p.43Chapter 5.1 --- System Model --- p.44Chapter 5.2 --- Multi-resolution Descriptors --- p.46Chapter 5.2.1 --- Laplacian Pyramids --- p.47Chapter 5.2.2 --- Haar Wavelets --- p.48Chapter 5.2.3 --- Steerable Pyramid --- p.49Chapter 5.3 --- Experimental Results --- p.51Chapter 5.4 --- Summary --- p.55Chapter 6 --- Motion Transfer --- p.59Chapter 6.1 --- Problem formulation --- p.60Chapter 6.1.1 --- Similarity on Appearance --- p.61Chapter 6.1.2 --- Similarity on Dynamic Behavior --- p.62Chapter 6.1.3 --- The Objective Function --- p.65Chapter 6.2 --- Further Work --- p.66Chapter 7 --- Conclusions --- p.67Chapter A --- List of Publications --- p.68Chapter B --- Degeneracy in LDS Model --- p.70Chapter B.l --- Equivalence Class --- p.70Chapter B.2 --- The Choice of the Matrix Q --- p.70Chapter B.3 --- Swapping the Column of C and A --- p.71Chapter C --- Probability Density Functions --- p.74Chapter C.1 --- Probability Distribution --- p.74Chapter C.2 --- Joint Probability Distributions --- p.75Bibliography --- p.7
Time series forecasting using wavelet and support vector machine
Master'sMASTER OF ENGINEERIN
Blind image deconvolution: nonstationary Bayesian approaches to restoring blurred photos
High quality digital images have become pervasive in modern scientific and everyday life —
in areas from photography to astronomy, CCTV, microscopy, and medical imaging. However
there are always limits to the quality of these images due to uncertainty and imprecision in the
measurement systems. Modern signal processing methods offer the promise of overcoming
some of these problems by postprocessing
these blurred and noisy images. In this thesis,
novel methods using nonstationary statistical models are developed for the removal of blurs
from out of focus and other types of degraded photographic images.
The work tackles the fundamental problem blind image deconvolution (BID); its goal is
to restore a sharp image from a blurred observation when the blur itself is completely unknown.
This is a “doubly illposed”
problem — extreme lack of information must be countered
by strong prior constraints about sensible types of solution. In this work, the hierarchical
Bayesian methodology is used as a robust and versatile framework to impart the required prior
knowledge.
The thesis is arranged in two parts. In the first part, the BID problem is reviewed, along
with techniques and models for its solution. Observation models are developed, with an
emphasis on photographic restoration, concluding with a discussion of how these are reduced
to the common linear spatially-invariant
(LSI) convolutional model. Classical methods for the
solution of illposed
problems are summarised to provide a foundation for the main theoretical
ideas that will be used under the Bayesian framework. This is followed by an indepth
review
and discussion of the various prior image and blur models appearing in the literature, and then
their applications to solving the problem with both Bayesian and nonBayesian
techniques.
The second part covers novel restoration methods, making use of the theory presented in Part I.
Firstly, two new nonstationary image models are presented. The first models local variance in
the image, and the second extends this with locally adaptive noncausal
autoregressive (AR)
texture estimation and local mean components. These models allow for recovery of image
details including edges and texture, whilst preserving smooth regions. Most existing methods
do not model the boundary conditions correctly for deblurring of natural photographs, and a
Chapter is devoted to exploring Bayesian solutions to this topic.
Due to the complexity of the models used and the problem itself, there are many challenges
which must be overcome for tractable inference. Using the new models, three different inference
strategies are investigated: firstly using the Bayesian maximum marginalised a posteriori
(MMAP) method with deterministic optimisation; proceeding with the stochastic methods
of variational Bayesian (VB) distribution approximation, and simulation of the posterior distribution
using the Gibbs sampler. Of these, we find the Gibbs sampler to be the most effective
way to deal with a variety of different types of unknown blurs. Along the way, details are given
of the numerical strategies developed to give accurate results and to accelerate performance.
Finally, the thesis demonstrates state of the art
results in blind restoration of synthetic and real
degraded images, such as recovering details in out of focus photographs
Multiscale Methods in Image Modelling and Image Processing
The field of modelling and processing of 'images' has fairly recently become important, even crucial, to areas of science, medicine, and engineering. The inevitable explosion of imaging modalities and approaches stemming from this fact has become a rich source of mathematical applications. 'Imaging' is quite broad, and suffers somewhat from this broadness. The general question of 'what is an image?' or perhaps 'what is a natural image?' turns out to be difficult to address. To make real headway one may need to strongly constrain the class of images being considered, as will be done in part of this thesis. On the other hand there are general principles that can guide research in many areas. One such principle considered is the assertion that (classes of) images have multiscale relationships, whether at a pixel level, between features, or other variants. There are both practical (in terms of computational complexity) and more philosophical reasons (mimicking the human visual system, for example) that suggest looking at such methods. Looking at scaling relationships may also have the advantage of opening a problem up to many mathematical tools. This thesis will detail two investigations into multiscale relationships, in quite different areas. One will involve Iterated Function Systems (IFS), and the other a stochastic approach to reconstruction of binary images (binary phase descriptions of porous media). The use of IFS in this context, which has often been called 'fractal image coding', has been primarily viewed as an image compression technique. We will re-visit this approach, proposing it as a more general tool. Some study of the implications of that idea will be presented, along with applications inferred by the results. In the area of reconstruction of binary porous media, a novel, multiscale, hierarchical annealing approach is proposed and investigated
Recommended from our members
Modelling and extraction of fundamental frequency in speech signals
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.One of the most important parameters of speech is the fundamental frequency of vibration of voiced sounds. The audio sensation of the fundamental frequency is known as the pitch. Depending on the tonal/non-tonal category of language, the fundamental frequency conveys intonation, pragmatics and meaning. In addition the fundamental frequency and intonation carry speaker gender, age, identity, speaking style and emotional state. Accurate estimation of the fundamental frequency is critically important for functioning of speech processing applications such as speech coding, speech recognition, speech synthesis and voice morphing. This thesis makes contributions to the development of accurate pitch estimation research in three distinct ways: (1) an investigation of the impact of the window length on pitch estimation error, (2) an investigation of the use of the higher order moments and (3) an investigation of an analysis-synthesis method for selection of the best pitch value among N proposed candidates. Experimental evaluations show that the length of the speech window has a major impact on the accuracy of pitch estimation. Depending on the similarity criteria and the order of the statistical moment a window length of 37 to 80 ms gives the least error. In order to avoid excessive delay as a consequence of using a longer window, a method is proposed
ii where the current short window is concatenated with the previous frames to form a longer signal window for pitch extraction. The use of second order and higher order moments, and the magnitude difference function, as the similarity criteria were explored and compared. A novel method of calculation of moments is introduced where the signal is split, i.e. rectified, into positive and negative valued samples. The moments for the positive and negative parts of the signal are computed separately and combined. The new method of calculation of moments from positive and negative parts and the higher order criteria provide competitive results. A challenging issue in pitch estimation is the determination of the best candidate from N extrema of the similarity criteria. The analysis-synthesis method proposed in this thesis selects the pitch candidate that provides the best reproduction (synthesis) of the harmonic spectrum of the original speech. The synthesis method must be such that the distortion increases with the increasing error in the estimate of the fundamental frequency. To this end a new method of spectral synthesis is proposed using an estimate of the spectral envelop and harmonically spaced asymmetric Gaussian pulses as excitation. The N-best method provides consistent reduction in pitch estimation error. The methods described in this thesis result in a significant improvement in the pitch accuracy and outperform the benchmark YIN method
Recommended from our members
Bayesian methods in music modelling
This thesis presents several hierarchical generative Bayesian models of musical signals designed to improve the accuracy of existing multiple pitch detection systems and other musical signal processing applications whilst remaining feasible for real-time computation. At the lowest level the signal is modelled as a set of overlapping sinusoidal basis functions. The parameters of these basis functions are built into a prior framework based on principles known from musical theory and the physics of musical instruments. The model of a musical note optionally includes phenomena such as frequency and amplitude modulations, damping, volume, timbre and inharmonicity. The occurrence of note onsets in a performance of a piece of music is controlled by an underlying tempo process and the alignment of the timings to the underlying score of the music.
A variety of applications are presented for these models under differing inference constraints. Where full Bayesian inference is possible, reversible-jump Markov Chain Monte Carlo is employed to estimate the number of notes and partial frequency components in each frame of music. We also use approximate techniques such as model selection criteria and variational Bayes methods for inference in situations where computation time is limited or the amount of data to be processed is large. For the higher level score parameters, greedy search and conditional modes algorithms are found to be sufficiently accurate.
We emphasize the links between the models and inference algorithms developed in this thesis with that in existing and parallel work, and demonstrate the effects of making modifications to these models both theoretically and by means of experimental results
- …