1 research outputs found
Blind dereverberation of speech from moving and stationary speakers using sequential Monte Carlo methods
Speech signals radiated in confined spaces are subject to reverberation due to reflections
of surrounding walls and obstacles. Reverberation leads to severe degradation
of speech intelligibility and can be prohibitive for applications where speech is digitally
recorded, such as audio conferencing or hearing aids. Dereverberation of speech
is therefore an important field in speech enhancement.
Driven by consumer demand, blind speech dereverberation has become a popular
field in the research community and has led to many interesting approaches in the literature.
However, most existing methods are dictated by their underlying models and
hence suffer from assumptions that constrain the approaches to specific subproblems
of blind speech dereverberation. For example, many approaches limit the dereverberation
to voiced speech sounds, leading to poor results for unvoiced speech. Few
approaches tackle single-sensor blind speech dereverberation, and only a very limited
subset allows for dereverberation of speech from moving speakers.
Therefore, the aim of this dissertation is the development of a flexible and extendible
framework for blind speech dereverberation accommodating different speech
sound types, single- or multiple sensor as well as stationary and moving speakers.
Bayesian methods benefit from β rather than being dictated by β appropriate model
choices. Therefore, the problem of blind speech dereverberation is considered from
a Bayesian perspective in this thesis. A generic sequential Monte Carlo approach
accommodating a multitude of models for the speech production mechanism and
room transfer function is consequently derived. In this approach both the anechoic
source signal and reverberant channel are estimated using their optimal estimators by
means of Rao-Blackwellisation of the state-space of unknown variables. The remaining
model parameters are estimated using sequential importance resampling.
The proposed approach is implemented for two different speech production models
for stationary speakers, demonstrating substantial reduction in reverberation for
both unvoiced and voiced speech sounds. Furthermore, the channel model is extended
to facilitate blind dereverberation of speech from moving speakers. Due to the
structure of measurement model, single- as well as multi-microphone processing is facilitated,
accommodating physically constrained scenarios where only a single sensor
can be used as well as allowing for the exploitation of spatial diversity in scenarios
where the physical size of microphone arrays is of no concern.
This dissertation is concluded with a survey of possible directions for future research,
including the use of switching Markov source models, joint target tracking
and enhancement, as well as an extension to subband processing for improved computational
efficiency