2,632 research outputs found

    Statistical single channel source separation

    Get PDF
    PhD ThesisSingle channel source separation (SCSS) principally is one of the challenging fields in signal processing and has various significant applications. Unlike conventional SCSS methods which were based on linear instantaneous model, this research sets out to investigate the separation of single channel in two types of mixture which is nonlinear instantaneous mixture and linear convolutive mixture. For the nonlinear SCSS in instantaneous mixture, this research proposes a novel solution based on a two-stage process that consists of a Gaussianization transform which efficiently compensates for the nonlinear distortion follow by a maximum likelihood estimator to perform source separation. For linear SCSS in convolutive mixture, this research proposes new methods based on nonnegative matrix factorization which decomposes a mixture into two-dimensional convolution factor matrices that represent the spectral basis and temporal code. The proposed factorization considers the convolutive mixing in the decomposition by introducing frequency constrained parameters in the model. The method aims to separate the mixture into its constituent spectral-temporal source components while alleviating the effect of convolutive mixing. In addition, family of Itakura-Saito divergence has been developed as a cost function which brings the beneficial property of scale-invariant. Two new statistical techniques are proposed, namely, Expectation-Maximisation (EM) based algorithm framework which maximizes the log-likelihood of a mixed signals, and the maximum a posteriori approach which maximises the joint probability of a mixed signal using multiplicative update rules. To further improve this research work, a novel method that incorporates adaptive sparseness into the solution has been proposed to resolve the ambiguity and hence, improve the algorithm performance. The theoretical foundation of the proposed solutions has been rigorously developed and discussed in details. Results have concretely shown the effectiveness of all the proposed algorithms presented in this thesis in separating the mixed signals in single channel and have outperformed others available methods.Universiti Teknikal Malaysia Melaka(UTeM), Ministry of Higher Education of Malaysi

    Multichannel high resolution NMF for modelling convolutive mixtures of non-stationary signals in the time-frequency domain

    Get PDF
    Several probabilistic models involving latent components have been proposed for modeling time-frequency (TF) representations of audio signals such as spectrograms, notably in the nonnegative matrix factorization (NMF) literature. Among them, the recent high-resolution NMF (HR-NMF) model is able to take both phases and local correlations in each frequency band into account, and its potential has been illustrated in applications such as source separation and audio inpainting. In this paper, HR-NMF is extended to multichannel signals and to convolutive mixtures. The new model can represent a variety of stationary and non-stationary signals, including autoregressive moving average (ARMA) processes and mixtures of damped sinusoids. A fast variational expectation-maximization (EM) algorithm is proposed to estimate the enhanced model. This algorithm is applied to piano signals, and proves capable of accurately modeling reverberation, restoring missing observations, and separating pure tones with close frequencies

    Non-negative mixtures

    Get PDF
    This is the author's accepted pre-print of the article, first published as M. D. Plumbley, A. Cichocki and R. Bro. Non-negative mixtures. In P. Comon and C. Jutten (Ed), Handbook of Blind Source Separation: Independent Component Analysis and Applications. Chapter 13, pp. 515-547. Academic Press, Feb 2010. ISBN 978-0-12-374726-6 DOI: 10.1016/B978-0-12-374726-6.00018-7file: Proof:p\PlumbleyCichockiBro10-non-negative.pdf:PDF owner: markp timestamp: 2011.04.26file: Proof:p\PlumbleyCichockiBro10-non-negative.pdf:PDF owner: markp timestamp: 2011.04.2

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    Blind source separation using statistical nonnegative matrix factorization

    Get PDF
    PhD ThesisBlind Source Separation (BSS) attempts to automatically extract and track a signal of interest in real world scenarios with other signals present. BSS addresses the problem of recovering the original signals from an observed mixture without relying on training knowledge. This research studied three novel approaches for solving the BSS problem based on the extensions of non-negative matrix factorization model and the sparsity regularization methods. 1) A framework of amalgamating pruning and Bayesian regularized cluster nonnegative tensor factorization with Itakura-Saito divergence for separating sources mixed in a stereo channel format: The sparse regularization term was adaptively tuned using a hierarchical Bayesian approach to yield the desired sparse decomposition. The modified Gaussian prior was formulated to express the correlation between different basis vectors. This algorithm automatically detected the optimal number of latent components of the individual source. 2) Factorization for single-channel BSS which decomposes an information-bearing matrix into complex of factor matrices that represent the spectral dictionary and temporal codes: A variational Bayesian approach was developed for computing the sparsity parameters for optimizing the matrix factorization. This approach combined the advantages of both complex matrix factorization (CMF) and variational -sparse analysis. BLIND SOURCE SEPARATION USING STATISTICAL NONNEGATIVE MATRIX FACTORIZATION ii 3) An imitated-stereo mixture model developed by weighting and time-shifting the original single-channel mixture where source signals can be modelled by the AR processes. The proposed mixing mixture is analogous to a stereo signal created by two microphones with one being real and another virtual. The imitated-stereo mixture employed the nonnegative tensor factorization for separating the observed mixture. The separability analysis of the imitated-stereo mixture was derived using Wiener masking. All algorithms were tested with real audio signals. Performance of source separation was assessed by measuring the distortion between original source and the estimated one according to the signal-to-distortion (SDR) ratio. The experimental results demonstrate that the proposed uninformed audio separation algorithms have surpassed among the conventional BSS methods; i.e. IS-cNTF, SNMF and CMF methods, with average SDR improvement in the ranges from 2.6dB to 6.4dB per source.Payap Universit

    Single-channel source separation using non-negative matrix factorization

    Get PDF
    corecore