8 research outputs found

    An Inverse-Gamma Source Variance Prior with Factorized Parameterization for Audio Source Separation

    Get PDF
    International audienceIn this paper we present a new statistical model for the power spectral density (PSD) of an audio signal and its application to multichannel audio source separation (MASS). The source signal is modeled with the local Gaussian model (LGM) and we propose to model its variance with an inverse-Gamma distribution, whose scale parameter is factorized as a rank-1 model. We discuss the interest of this approach and evaluate it in a MASS task with underdetermined convolutive mixtures. For this aim, we derive a variational EM algorithm for parameter estimation and source inference. The proposed model shows a benefit in source separation performance compared to a state-of-the-art LGM NMF-based technique

    A Variational EM Algorithm for the Separation of Moving Sound Sources

    Get PDF
    International audienceThis paper addresses the problem of separation of moving sound sources. We propose a probabilistic framework based on the complex Gaussian model combined with non-negative matrix factorization. The properties associated with moving sources are modeled using time-varying mixing filters described by a stochastic temporal process. We present a variational expectation-maximization (VEM) algorithm that employs a Kalman smoother to estimate the mixing filters. The sound sources are separated by means of Wiener filters, built from the estimators provided by the proposed VEM algorithm. Preliminary experiments with simulated data show that, while for static sources we obtain results comparable with the base-line method of Ozerov et al., in the case of moving source our method outperforms a piece-wise version of the baseline method

    A Generative Model for the Joint Registration of Multiple Point Sets

    Get PDF
    International audienceThis paper describes a probabilistic generative model and its associated algorithm to jointly register multiple point sets. The vast majority of state-of-the-art registration techniques select one of the sets as the ''model" and perform pairwise alignments between the other sets and this set. The main drawback of this mode of operation is that there is no guarantee that the model-set is free of noise and outliers, which contaminates the estimation of the registration parameters. Unlike previous work, the proposed method treats all the point sets on an equal footing: they are realizations of a Gaussian mixture (GMM) and the registration is cast into a clustering problem. We formally derive an EM algorithm that estimates both the GMM parameters and the rotations and translations that map each individual set onto the ''central" model. The mixture means play the role of the registered set of points while the variances provide rich information about the quality of the registration. We thoroughly validate the proposed method with challenging datasets, we compare it with several state-of-the-art methods, and we show its potential for fusing real depth data

    Audio source separation into the wild

    Get PDF
    International audienceThis review chapter is dedicated to multichannel audio source separation in real-life environment. We explore some of the major achievements in the field and discuss some of the remaining challenges. We will explore several important practical scenarios, e.g. moving sources and/or microphones, varying number of sources and sensors, high reverberation levels, spatially diffuse sources, and synchronization problems. Several applications such as smart assistants, cellular phones, hearing aids and robots, will be discussed. Our perspectives on the future of the field will be given as concluding remarks of this chapter

    Quelques Contributions pour la SĂ©paration et la Journalisation de Sources Audio dans des MĂ©langes Multicanaux Convolutifs

    Get PDF
    In this thesis we address the problem of multichannel audio source separa- tion (MASS) for underdetermined convolutive mixtures through probabilistic modeling. We focus on three aspects of the problem and make three contri- butions. Firstly, inspired from the empirically well validated representation of an audio signal, that is know as local Gaussian signal model (LGM) with non-negative matrix factorization (NMF), we propose a Bayesian extension to this, that overcomes some of the limitations of the NMF. We incorporate this representation in a MASS framework and compare it with the state of the art in MASS, yielding promising results. Secondly, we study how to separate mix- tures of moving sources and/or of moving microphones. Movements make the acoustic path between sources and microphones become time-varying. Ad- dressing time-varying audio mixtures appears is not so popular in the MASS literature. Thus, we begin from a state of the art LGM-with-NMF method designed for separating time-invariant audio mixtures and propose an exten- sion that uses a Kalman smoother to track the acoustic path across time. The proposed method is benchmarked against a block-wise adaptation of that state of the art (ran on time segments), and delivers competitive results on both simulated and real-world mixtures. Lastly, we investigate the link between MASS and the task of audio diarisation. Audio diarisation is the detection of the time intervals where each speaker/source is active or silent. Most state of the art MASS methods consider the sources to emit continuously; A hypothe- sis that can result in spurious signal estimates for a source, in intervals where that source was silent. Our aim is that diarisation can aid MASS by indicat- ing the emitting sources at each time frame. To that extent we design a joint framework for simultaneous diarisation and MASS, that incorporates a hidden Markov model (HMM) to track the temporal activity of the sources, within a state of the art LGM-with-NMF MASS framework. We compare the proposed method with the state of the art in MASS and audio diarisation tasks. We ob- tain performances comparable, with the state of the art, in terms of separation while winning in terms of diarisation.Dans cette thèse nous abordons le problème de la séparation de sources audio dans des mélanges convolutifs multicanaux et sous-déterminés, en utilisant une modélisation probabiliste. Nous nous concentrons sur trois aspects, et nous apportons trois contributions. D’abord, nous nous inspirons du modèle Gaussien local par factorisation en matrices non-négatives (LGM-with-NMF), qui est un modèle empiriquement validé pour représenter un signal audio. Nous proposons une extension Bayésienne de ce modèle, qui permet de sur- passer certaines limitations du modèle NMF. Nous incorporons cette représentation dans un cadre de separation audio multicanaux, et le comparons avec l’état de l’art sur des tâches de séparation. Nous obtenons des résultats prometteurs. Deuxièment, nous étudions comment séparer des mélanges audio de sources et/ou des capteurs en mouvement. Ces déplacements rendent le chemin acoustique entre les sources et les microphones variant en cours du temps. L’adressage des mélanges convolutifs variant au cours du temps est peu exploré dans la littérature. Ainsi, nous partons d’une méthode de l’état de l’art développée pour la séparation de mélanges invariant (sources et microphones statiques) et utilisant LGM-with-NMF. Nous proposons à ceci une extension qui utilise un filtre de Kalman pour suivre le chemin acoustique au cours du temps. La technique proposée est comparée à une adaptation block-par-block d’une technique de l’état de l’art appliquée sur des intervalles de temps, et a donné des résultats exceptionels sur les mélanges simulés et les mélanges du monde réel. Enfin, nous investiguons les similitudes entre la séparation et la journalisation audio. La journalisation est le problème de détection des intervalles auxquels chaque locuteur/source est émettant. La plupart des méthodes de séparation supposent toutes les sources émettent continuellement. Cette hypothèe peut donner lieu à de fausses estimations durant les intervalles au cours desquels cette source n’a pas émis. Notre objectif est que la journalisation puisse aider à résoudre la séparation, en indiquant les sources qui émettent à chaque intervalle de temps. Dans cette mesure, nous concevons une cadre commun pour traiter simultanément la journalisation et la séparation du mélange audio. Ce cadre incorpore un modèle de Markov caché pour suivre les activités des sources au sein d’une technique de séparation LGM-with-NMF. Nous comparons l’algorithme proposé à l’état de l’art sur des tâches de séparation et de journalisation. Nous obtenons des performances comparables avec l’état de l’art pour la séparation, et supériures pour la journalisation

    An EM Algorithm for Joint Source Separation and Diarisation of Multichannel Convolutive Speech Mixtures

    Get PDF
    International audienceWe present a probabilistic model for joint source separation and diarisation of multichannel convolutive speech mixtures. We build upon the framework of local Gaussian model (LGM) with non-negative matrix factorization (NMF). The diarisa-tion is introduced as a temporal labeling of each source in the mix as active or inactive at the short-term frame level. We devise an EM algorithm in which the source separation process is aided by the diarisation state, since the latter indicates the sources actually present in the mixture. The diarisation state is tracked with a Hidden Markov Model (HMM) with emission probabilities calculated from the estimated source signals. The proposed EM has separation performance comparable with a state-of-the-art LGM NMF method, while out-performing a state-of-the-art speaker diarisation pipeline

    Exploiting the Intermittency of Speech for Joint Separation and Diarization

    Get PDF
    International audienceNatural conversations are spontaneous exchanges involving two or more people speaking in an intermittent manner. Therefore one expects such conversation to have intervals where some of the speakers are silent. Yet, most (multichannel) audio source separation (MASS) methods consider the sound sources to be continuously emitting on the total duration of the processed mixture. In this paper we propose a probabilistic model for MASS where the sources may have pauses. The activity of the sources is modeled as a hidden state, the diarization state, enabling us to activate/de-activate the sound sources at time frame resolution. We plug the diarization model within the spatial covariance matrix model proposed for MASS, and obtain an improvement in performance over the state of the art when separating mixtures with intermittent speakers

    A Variational EM Algorithm for the Separation of Time-Varying Convolutive Audio Mixtures

    Get PDF
    International audienceThis paper addresses the problem of separating audio sources from time-varying convolutive mixtures. We propose a probabilistic framework based on the local complex-Gaussian model combined with non-negative matrix factorization. The time-varying mixing filters are modeled by a continuous temporal stochastic process. We present a variational expectation-maximization (VEM) algorithm that employs a Kalman smoother to estimate the time-varying mixing matrix, and that jointly estimate the source parameters. The sound sources are then separated by Wiener filters constructed with the estimators provided by the VEM algorithm. Extensive experiments on simulated data show that the proposed method outperforms a block-wise version of a state-of-the-art baseline method
    corecore