187 research outputs found
Underdetermined source separation using a sparse STFT framework and weighted laplacian directional modelling
The instantaneous underdetermined audio source separation problem of
K-sensors, L-sources mixing scenario (where K < L) has been addressed by many
different approaches, provided the sources remain quite distinct in the virtual
positioning space spanned by the sensors. This problem can be tackled as a
directional clustering problem along the source position angles in the mixture.
The use of Generalised Directional Laplacian Densities (DLD) in the MDCT domain
for underdetermined source separation has been proposed before. Here, we derive
weighted mixtures of DLDs in a sparser representation of the data in the STFT
domain to perform separation. The proposed approach yields improved results
compared to our previous offering and compares favourably with the
state-of-the-art.Comment: EUSIPCO 2016, Budapest, Hungar
Maximum a Posteriori Binary Mask Estimation for Underdetermined Source Separation Using Smoothed Posteriors
Sound source separation has become a topic of intensive research in the last years. The research effort has been specially relevant for the underdetermined case, where a considerable number of sparse methods working in the time-frequency (T-F) domain have appeared. In this context, although binary masking seems to be a preferred choice for source demixing, the estimated masks differ substantially from the ideal ones. This paper proposes a maximum a posteriori (MAP) framework for binary mask estimation. To this end, class-conditional source probabilities according to the observed mixing parameters are modeled via ratios of dependent Cauchy distributions while source priors are iteratively calculated from the observed histograms. Moreover, spatially smoothed posteriors in the T-F domain are proposed to avoid noisy estimates, showing that the estimated masks are closer to the ideal ones in terms of objective performance measures.This work was supported by the Spanish Ministry of Science and Innovation under project TEC2009-14414-C03-01. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Jingdong Chen.Cobos Serrano, M.; López Monfort, JJ. (2012). Maximum a Posteriori Binary Mask Estimation for Underdetermined Source Separation Using Smoothed Posteriors. IEEE Transactions on Audio, Speech and Language Processing. 20(7):2059-2064. doi:10.1109/TASL.2012.2195654S2059206420
The LOST Algorithm: finding lines and separating speech mixtures
Robust clustering of data into linear subspaces is a frequently encountered problem. Here, we treat clustering of one-dimensional subspaces that cross the origin. This problem arises in blind source separation, where the subspaces correspond directly to columns of a mixing matrix. We propose the LOST algorithm, which identifies such subspaces using a procedure similar in spirit to EM.
This line finding procedure combined with a transformation into a sparse domain and an L1-norm minimisation constitutes a blind source separation algorithm for the separation of instantaneous mixtures with an arbitrary number of mixtures and sources. We perform an extensive investigation on the general separation performance of the LOST algorithm using randomly generated mixtures, and empirically estimate the performance of the algorithm in the presence of noise. Furthermore, we implement a simple
scheme whereby the number of sources present in the mixtures can be detected automaticall
Application of sound source separation methods to advanced spatial audio systems
This thesis is related to the field of Sound Source Separation (SSS). It addresses the development
and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by
means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in twochannel
stereo format, special up-converters are required to use advanced spatial audio reproduction formats,
such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to
accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is
required.
Source separation problems in digital signal processing are those in which several signals have been mixed
together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied
to existing two-channel mixtures to extract the different objects that compose the stereo scene. Unfortunately,
most stereo mixtures are underdetermined, i.e., there are more sound sources than audio channels. This
condition makes the SSS problem especially difficult and stronger assumptions have to be taken, often related to
the sparsity of the sources under some signal transformation.
This thesis is focused on the application of SSS techniques to the spatial sound reproduction field. As a result,
its contributions can be categorized within these two areas. First, two underdetermined SSS methods are
proposed to deal efficiently with the separation of stereo sound mixtures. These techniques are based on a
multi-level thresholding segmentation approach, which enables to perform a fast and unsupervised separation of
sound sources in the time-frequency domain. Although both techniques rely on the same clustering type, the
features considered by each of them are related to different localization cues that enable to perform separation
of either instantaneous or real mixtures.Additionally, two post-processing techniques aimed at
improving the isolation of the separated sources are proposed. The performance achieved by
several SSS methods in the resynthesis of WFS sound scenes is afterwards evaluated by means of
listening tests, paying special attention to the change observed in the perceived spatial attributes.
Although the estimated sources are distorted versions of the original ones, the masking effects
involved in their spatial remixing make artifacts less perceptible, which improves the overall
assessed quality. Finally, some novel developments related to the application of time-frequency
processing to source localization and enhanced sound reproduction are presented.Cobos Serrano, M. (2009). Application of sound source separation methods to advanced spatial audio systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8969Palanci
Combining blockwise and multi-coefficient stepwise approches in a general framework for online audio source separation
This article considers the problem of online audio source separation. Various algorithms can be found in the literature, featuring either blockwise or stepwise approaches, and using either the spectral or spatial characteristics of the sound sources of a mixture. We offer an algorithm that can combine both stepwise and blockwise approaches, and that can use spectral and spatial information. We propose a method for pre-processing the data of each block and offer a way to deduce an Equivalent Rectangular Bandwith time-frequency representation out of a Short-Time Fourier Transform. The efficiency of our algorithm is then tested for various parameters and the effect of each of those parameters on the quality of separation and on the computation time is then discussed
Online source separation in reverberant environments exploiting known speaker locations
This thesis concerns blind source separation techniques using second order statistics and higher order statistics for reverberant environments. A focus of the thesis is algorithmic simplicity with a view to the algorithms being implemented in their online forms. The main challenge of blind source separation applications is to handle reverberant acoustic environments; a further complication is changes in the acoustic environment such as when human speakers physically move.
A novel time-domain method which utilises a pair of finite impulse response filters is proposed. The method of principle angles is defined which exploits a singular value decomposition for their design. The pair of filters are implemented within a generalised sidelobe canceller structure, thus the method can be considered as a beamforming method which cancels one source. An adaptive filtering stage is then employed to recover the remaining source, by exploiting the output of the beamforming stage as a noise reference.
A common approach to blind source separation is to use methods that use higher order statistics such as independent component analysis. When dealing with realistic convolutive audio and speech mixtures, processing in the frequency domain at each frequency bin is required. As a result this introduces the permutation problem, inherent in independent component analysis, across the frequency bins. Independent vector analysis directly addresses this issue by modeling the dependencies between frequency bins, namely making use of a source vector prior. An alternative source prior for real-time (online) natural gradient independent vector analysis is proposed. A Student's t probability density function is known to be more suited for speech sources, due to its heavier tails, and is incorporated into a real-time version of natural gradient independent vector analysis. The final algorithm is realised as a real-time embedded application on a floating point Texas Instruments digital signal processor platform.
Moving sources, along with reverberant environments, cause significant problems in realistic source separation systems as mixing filters become time variant. A method which employs the pair of cancellation filters, is proposed to cancel one source coupled with an online natural gradient independent vector analysis technique to improve average separation performance in the context of step-wise moving sources. This addresses `dips' in performance when sources move. Results show the average convergence time of the performance parameters is improved.
Online methods introduced in thesis are tested using impulse responses measured in reverberant environments, demonstrating their robustness and are shown to perform better than established methods in a variety of situations
Mixture of beamformers for speech separation and extraction
In many audio applications, the signal of interest is corrupted by acoustic background noise,
interference, and reverberation. The presence of these contaminations can significantly degrade
the quality and intelligibility of the audio signal. This makes it important to develop signal
processing methods that can separate the competing sources and extract a source of interest.
The estimated signals may then be either directly listened to, transmitted, or further processed,
giving rise to a wide range of applications such as hearing aids, noise-cancelling headphones,
human-computer interaction, surveillance, and hands-free telephony.
Many of the existing approaches to speech separation/extraction relied on beamforming techniques.
These techniques approach the problem from a spatial point of view; a microphone
array is used to form a spatial filter which can extract a signal from a specific direction and
reduce the contamination of signals from other directions. However, when there are fewer
microphones than sources (the underdetermined case), perfect attenuation of all interferers becomes
impossible and only partial interference attenuation is possible.
In this thesis, we present a framework which extends the use of beamforming techniques to
underdetermined speech mixtures. We describe frequency domain non-linear mixture of beamformers
that can extract a speech source from a known direction. Our approach models the
data in each frequency bin via Gaussian mixture distributions, which can be learned using the
expectation maximization algorithm. The model learning is performed using the observed mixture
signals only, and no prior training is required. The signal estimator comprises of a set of
minimum mean square error (MMSE), minimum variance distortionless response (MVDR), or
minimum power distortionless response (MPDR) beamformers. In order to estimate the signal,
all beamformers are concurrently applied to the observed signal, and the weighted sum of
the beamformers’ outputs is used as the signal estimator, where the weights are the estimated
posterior probabilities of the Gaussian mixture states. These weights are specific to each timefrequency
point. The resulting non-linear beamformers do not need to know or estimate the
number of sources, and can be applied to microphone arrays with two or more microphones
with arbitrary array configuration. We test and evaluate the described methods on underdetermined
speech mixtures. Experimental results for the non-linear beamformers in underdetermined
mixtures with room reverberation confirm their capability to successfully extract speech
sources
Joint Tensor Factorization and Outlying Slab Suppression with Applications
We consider factoring low-rank tensors in the presence of outlying slabs.
This problem is important in practice, because data collected in many
real-world applications, such as speech, fluorescence, and some social network
data, fit this paradigm. Prior work tackles this problem by iteratively
selecting a fixed number of slabs and fitting, a procedure which may not
converge. We formulate this problem from a group-sparsity promoting point of
view, and propose an alternating optimization framework to handle the
corresponding () minimization-based low-rank tensor
factorization problem. The proposed algorithm features a similar per-iteration
complexity as the plain trilinear alternating least squares (TALS) algorithm.
Convergence of the proposed algorithm is also easy to analyze under the
framework of alternating optimization and its variants. In addition,
regularization and constraints can be easily incorporated to make use of
\emph{a priori} information on the latent loading factors. Simulations and real
data experiments on blind speech separation, fluorescence data analysis, and
social network mining are used to showcase the effectiveness of the proposed
algorithm
- …