3 research outputs found
Joint Mixing Vector and Binaural Model Based Stereo Source Separation
In this paper the mixing vector (MV) in the statistical mixing model is compared to the binaural cues represented by interaural level and phase differences (ILD and IPD). It is shown that the MV distributions are quite distinct while binaural models overlap when the sources are close to each other. On the other hand, the binaural cues are more robust to high reverberation than MV models. According to this complementary behavior we introduce a new robust algorithm for stereo speech separation which considers both additive and convolutive noise signals to model the MV and binaural cues in parallel and estimate probabilistic time-frequency masks. The contribution of each cue to the final decision is also adjusted by weighting the log-likelihoods of the cues empirically. Furthermore, the permutation problem of the frequency domain blind source separation (BSS) is addressed by initializing the MVs based on binaural cues. Experiments are performed systematically on determined and underdetermined speech mixtures in five rooms with various acoustic properties including anechoic, highly reverberant, and spatially-diffuse noise conditions. The results in terms of signal-to-distortion-ratio (SDR) confirm the benefits of integrating the MV and binaural cues, as compared with two state-of-the-art baseline algorithms which only use MV or the binaural cues
Enhanced independent vector analysis for speech separation in room environments
PhD ThesisThe human brain has the ability to focus on a desired sound source in the presence
of several active sound sources. The machine based method lags behind in mimicking
this particular skill of human beings. In the domain of digital signal processing this
problem is termed as the cocktail party problem. This thesis thus aims to further
the eld of acoustic source separation in the frequency domain based on exploiting
source independence. The main challenge in such frequency domain algorithms is the
permutation problem. Independent vector analysis (IVA) is a frequency domain blind
source separation algorithm which can theoretically obviate the permutation problem
by preserving the dependency structure within each source vector whilst eliminating
the dependency between the frequency bins of di erent source vectors. This thesis in
particular focuses on improving the separation performance of IVA algorithms which
are used for frequency domain acoustic source separation in real room environments.
The source prior is crucial to the separation performance of the IVA algorithm as it
is used to model the nonlinear dependency structure within the source vectors. An
alternative multivariate Student's t distribution source prior is proposed for the IVA
algorithm as it is known to be well suited for modelling certain speech signals due to
its heavy tail nature. Therefore the nonlinear score function that is derived from the
proposed Student's t source prior can better model the dependency structure within the
frequency bins and thereby enhance the separation performance and the convergence
speed of the IVA and the Fast version of the IVA (FastIVA) algorithms.
4
5
A novel energy driven mixed Student's t and the original super Gaussian source prior
is also proposed for the IVA algorithms. As speech signals can be composed of many
high and low amplitude data points, therefore the Student's t distribution in the mixed
source prior can account for the high amplitude data points whereas the original su-
per Gaussian distribution can cater for the other information in the speech signals.
Furthermore, the weight of both distributions in the mixed source prior can be ad-
justed according to the energy of the observed mixtures. Therefore the mixed source
prior adapts the measured signals and further enhances the performance of the IVA
algorithm.
A common approach within the IVA algorithm is to model di erent speech sources with
an identical source prior, however this does not account for the unique characteristics
of each speech signal. Therefore dependency modelling for di erent speech sources
can be improved by modelling di erent speech sources with di erent source priors.
Hence, the Student's t mixture model (SMM) is introduced as a source prior for the
IVA algorithm. This new source prior can adapt according to the nature of di erent
speech signals and the parameters for the proposed SMM source prior are estimated
by deriving an e cient expectation maximization (EM) algorithm. As a result of this
study, a novel EM framework for the IVA algorithm with the SMM as a source prior is
proposed which is capable of separating the sources in an e cient manner.
The proposed algorithms are tested in various realistic reverberant room environments
with real speech signals. All the experiments and evaluation demonstrate the robustness
and enhanced separation performance of the proposed algorithms