326 research outputs found

    Underdetermined instantaneous audio source separation via local Gaussian modeling

    Get PDF
    International audienceUnderdetermined source separation is often carried out by modeling time-frequency source coefficients via a fixed sparse prior. This approach fails when the number of active sources in one time-frequency bin is larger than the number of channels or when active sources lie on both sides of an inactive source. In this article, we partially address these issues by modeling time-frequency source coefficients via Gaussian priors with free variances. We study the resulting maximum likelihood criterion and derive a fast non-iterative optimization algorithm that finds the global minimum. We show that this algorithm outperforms state-of-the- art approaches over stereo instantaneous speech mixtures

    Probabilistic Modeling Paradigms for Audio Source Separation

    Get PDF
    This is the author's final version of the article, first published as E. Vincent, M. G. Jafari, S. A. Abdallah, M. D. Plumbley, M. E. Davies. Probabilistic Modeling Paradigms for Audio Source Separation. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 7, pp. 162-185. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch007file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation systems operate either by emulating the human auditory system or by inferring the parameters of probabilistic sound models. In this chapter, the authors focus on the latter approach and provide a joint overview of established and recent models, including independent component analysis, local time-frequency models and spectral template-based models. They show that most models are instances of one of the following two general paradigms: linear modeling or variance modeling. They compare the merits of either paradigm and report objective performance figures. They also,conclude by discussing promising combinations of probabilistic priors and inference algorithms that could form the basis of future state-of-the-art systems

    Underdetermined source separation using a sparse STFT framework and weighted laplacian directional modelling

    Full text link
    The instantaneous underdetermined audio source separation problem of K-sensors, L-sources mixing scenario (where K < L) has been addressed by many different approaches, provided the sources remain quite distinct in the virtual positioning space spanned by the sensors. This problem can be tackled as a directional clustering problem along the source position angles in the mixture. The use of Generalised Directional Laplacian Densities (DLD) in the MDCT domain for underdetermined source separation has been proposed before. Here, we derive weighted mixtures of DLDs in a sparser representation of the data in the STFT domain to perform separation. The proposed approach yields improved results compared to our previous offering and compares favourably with the state-of-the-art.Comment: EUSIPCO 2016, Budapest, Hungar

    Single-Channel Signal Separation Using Spectral Basis Correlation with Sparse Nonnegative Tensor Factorization

    Get PDF
    A novel approach for solving the single-channel signal separation is presented the proposed sparse nonnegative tensor factorization under the framework of maximum a posteriori probability and adaptively fine-tuned using the hierarchical Bayesian approach with a new mixing mixture model. The mixing mixture is an analogy of a stereo signal concept given by one real and the other virtual microphones. An “imitated-stereo” mixture model is thus developed by weighting and time-shifting the original single-channel mixture. This leads to an artificial mixing system of dual channels which gives rise to a new form of spectral basis correlation diversity of the sources. Underlying all factorization algorithms is the principal difficulty in estimating the adequate number of latent components for each signal. This paper addresses these issues by developing a framework for pruning unnecessary components and incorporating a modified multivariate rectified Gaussian prior information into the spectral basis features. The parameters of the imitated-stereo model are estimated via the proposed sparse nonnegative tensor factorization with Itakura–Saito divergence. In addition, the separability conditions of the proposed mixture model are derived and demonstrated that the proposed method can separate real-time captured mixtures. Experimental testing on real audio sources has been conducted to verify the capability of the proposed method

    Improved Convolutive and Under-Determined Blind Audio Source Separation with MRF Smoothing

    Get PDF
    Convolutive and under-determined blind audio source separation from noisy recordings is a challenging problem. Several computational strategies have been proposed to address this problem. This study is concerned with several modifications to the expectation-minimization-based algorithm, which iteratively estimates the mixing and source parameters. This strategy assumes that any entry in each source spectrogram is modeled using superimposed Gaussian components, which are mutually and individually independent across frequency and time bins. In our approach, we resolve this issue by considering a locally smooth temporal and frequency structure in the power source spectrograms. Local smoothness is enforced by incorporating a Gibbs prior in the complete data likelihood function, which models the interactions between neighboring spectrogram bins using a Markov random field. Simulations using audio files derived from stereo audio source separation evaluation campaign 2008 demonstrate high efficiency with the proposed improvement

    Underdetermined Separation of Speech Mixture Based on Sparse Bayesian Learning

    Get PDF
    This paper describes a novel algorithm for underdetermined speech separation problem based on compressed sensing which is an emerging technique for efficient data reconstruction. The proposed algorithm consists of two steps. The unknown mixing matrix is firstly estimated from the speech mixtures in the transform domain by using K-means clustering algorithm. In the second step, the speech sources are recovered based on an autocalibration sparse Bayesian learning algorithm for speech signal. Numerical experiments including the comparison with other sparse representation approaches are provided to show the achieved performance improvement

    Cram\'er-Rao Bounds for Complex-Valued Independent Component Extraction: Determined and Piecewise Determined Mixing Models

    Full text link
    This paper presents Cram\'er-Rao Lower Bound (CRLB) for the complex-valued Blind Source Extraction (BSE) problem based on the assumption that the target signal is independent of the other signals. Two instantaneous mixing models are considered. First, we consider the standard determined mixing model used in Independent Component Analysis (ICA) where the mixing matrix is square and non-singular and the number of the latent sources is the same as that of the observed signals. The CRLB for Independent Component Extraction (ICE) where the mixing matrix is re-parameterized in order to extract only one independent target source is computed. The target source is assumed to be non-Gaussian or non-circular Gaussian while the other signals (background) are circular Gaussian or non-Gaussian. The results confirm some previous observations known for the real domain and bring new results for the complex domain. Also, the CRLB for ICE is shown to coincide with that for ICA when the non-Gaussianity of background is taken into account. %unless the assumed sources' distributions are misspecified. Second, we extend the CRLB analysis to piecewise determined mixing models. Here, the observed signals are assumed to obey the determined mixing model within short blocks where the mixing matrices can be varying from block to block. However, either the mixing vector or the separating vector corresponding to the target source is assumed to be constant across the blocks. The CRLBs for the parameters of these models bring new performance bounds for the BSE problem.Comment: 25 pages, 8 figure

    Blind Spectral-GMM Estimation for Underdetermined Instantaneous Audio Source Separation

    Get PDF
    The underdetermined blind audio source separation problem is often addressed in the time-frequency domain by assuming that each time-frequency point is an independently distributed random variable. Other approaches which are not blind assume a more structured model, like the Spectral Gaussian Mixture Models (Spectral-GMMs), thus exploiting statistical diversity of audio sources in the separation process. However, in this last approach, Spectral-GMMs are supposed to be learned from some training signals. In this paper, we propose a new approach for learning Spectral-GMMs of the sources without the need of using training signals. The proposed blind method significantly outperforms state-of-the-art approaches on stereophonic instantaneous music mixtures
    • …
    corecore