98 research outputs found
Multichannel high resolution NMF for modelling convolutive mixtures of non-stationary signals in the time-frequency domain
Several probabilistic models involving latent components have been proposed for modeling time-frequency (TF) representations of audio signals such as spectrograms, notably in the nonnegative matrix factorization (NMF) literature. Among them, the recent high-resolution NMF (HR-NMF) model is able to take both phases and local correlations in each frequency band into account, and its potential has been illustrated in applications such as source separation and audio inpainting. In this paper, HR-NMF is extended to multichannel signals and to convolutive mixtures. The new model can represent a variety of stationary and non-stationary signals, including autoregressive moving average (ARMA) processes and mixtures of damped sinusoids. A fast variational expectation-maximization (EM) algorithm is proposed to estimate the enhanced model. This algorithm is applied to piano signals, and proves capable of accurately modeling reverberation, restoring missing observations, and separating pure tones with close frequencies
Unsupervised Learning for Monaural Source Separation Using Maximization–Minimization Algorithm with Time–Frequency Deconvolution
This paper presents an unsupervised learning algorithm for sparse nonnegative matrix factor time–frequency deconvolution with optimized fractional β -divergence. The β -divergence is a group of cost functions parametrized by a single parameter β . The Itakura–Saito divergence, Kullback–Leibler divergence and Least Square distance are special cases that correspond to β=0, 1, 2 , respectively. This paper presents a generalized algorithm that uses a flexible range of β that includes fractional values. It describes a maximization–minimization (MM) algorithm leading to the development of a fast convergence multiplicative update algorithm with guaranteed convergence. The proposed model operates in the time–frequency domain and decomposes an information-bearing matrix into two-dimensional deconvolution of factor matrices that represent the spectral dictionary and temporal codes. The deconvolution process has been optimized to yield sparse temporal codes through maximizing the likelihood of the observations. The paper also presents a method to estimate the fractional β value. The method is demonstrated on separating audio mixtures recorded from a single channel. The paper shows that the extraction of the spectral dictionary and temporal codes is significantly more efficient by using the proposed algorithm and subsequently leads to better source separation performance. Experimental tests and comparisons with other factorization methods have been conducted to verify its efficacy
Statistical single channel source separation
PhD ThesisSingle channel source separation (SCSS) principally is one of the challenging fields
in signal processing and has various significant applications. Unlike conventional
SCSS methods which were based on linear instantaneous model, this research sets out
to investigate the separation of single channel in two types of mixture which is
nonlinear instantaneous mixture and linear convolutive mixture. For the nonlinear
SCSS in instantaneous mixture, this research proposes a novel solution based on a
two-stage process that consists of a Gaussianization transform which efficiently
compensates for the nonlinear distortion follow by a maximum likelihood estimator to
perform source separation. For linear SCSS in convolutive mixture, this research
proposes new methods based on nonnegative matrix factorization which decomposes a
mixture into two-dimensional convolution factor matrices that represent the spectral
basis and temporal code. The proposed factorization considers the convolutive mixing
in the decomposition by introducing frequency constrained parameters in the model.
The method aims to separate the mixture into its constituent spectral-temporal source
components while alleviating the effect of convolutive mixing. In addition, family of
Itakura-Saito divergence has been developed as a cost function which brings the
beneficial property of scale-invariant. Two new statistical techniques are proposed,
namely, Expectation-Maximisation (EM) based algorithm framework which
maximizes the log-likelihood of a mixed signals, and the maximum a posteriori
approach which maximises the joint probability of a mixed signal using multiplicative
update rules. To further improve this research work, a novel method that incorporates
adaptive sparseness into the solution has been proposed to resolve the ambiguity and
hence, improve the algorithm performance. The theoretical foundation of the proposed
solutions has been rigorously developed and discussed in details. Results have
concretely shown the effectiveness of all the proposed algorithms presented in this
thesis in separating the mixed signals in single channel and have outperformed others
available methods.Universiti Teknikal Malaysia Melaka(UTeM),
Ministry of Higher Education of Malaysi
- …