1,189 research outputs found

    POLYPHONIC PIANO TRANSCRIPTION USING NON-NEGATIVE MATRIX FACTORISATION WITH GROUP SPARSITY

    Get PDF
    (c)2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Published in: Proc IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014), Florence, Italy, 5-9 May 2014. pp.3136-3140

    Statistical single channel source separation

    Get PDF
    PhD ThesisSingle channel source separation (SCSS) principally is one of the challenging fields in signal processing and has various significant applications. Unlike conventional SCSS methods which were based on linear instantaneous model, this research sets out to investigate the separation of single channel in two types of mixture which is nonlinear instantaneous mixture and linear convolutive mixture. For the nonlinear SCSS in instantaneous mixture, this research proposes a novel solution based on a two-stage process that consists of a Gaussianization transform which efficiently compensates for the nonlinear distortion follow by a maximum likelihood estimator to perform source separation. For linear SCSS in convolutive mixture, this research proposes new methods based on nonnegative matrix factorization which decomposes a mixture into two-dimensional convolution factor matrices that represent the spectral basis and temporal code. The proposed factorization considers the convolutive mixing in the decomposition by introducing frequency constrained parameters in the model. The method aims to separate the mixture into its constituent spectral-temporal source components while alleviating the effect of convolutive mixing. In addition, family of Itakura-Saito divergence has been developed as a cost function which brings the beneficial property of scale-invariant. Two new statistical techniques are proposed, namely, Expectation-Maximisation (EM) based algorithm framework which maximizes the log-likelihood of a mixed signals, and the maximum a posteriori approach which maximises the joint probability of a mixed signal using multiplicative update rules. To further improve this research work, a novel method that incorporates adaptive sparseness into the solution has been proposed to resolve the ambiguity and hence, improve the algorithm performance. The theoretical foundation of the proposed solutions has been rigorously developed and discussed in details. Results have concretely shown the effectiveness of all the proposed algorithms presented in this thesis in separating the mixed signals in single channel and have outperformed others available methods.Universiti Teknikal Malaysia Melaka(UTeM), Ministry of Higher Education of Malaysi

    Underdetermined convolutive source separation using two dimensional non-negative factorization techniques

    Get PDF
    PhD ThesisIn this thesis the underdetermined audio source separation has been considered, that is, estimating the original audio sources from the observed mixture when the number of audio sources is greater than the number of channels. The separation has been carried out using two approaches; the blind audio source separation and the informed audio source separation. The blind audio source separation approach depends on the mixture signal only and it assumes that the separation has been accomplished without any prior information (or as little as possible) about the sources. The informed audio source separation uses the exemplar in addition to the mixture signal to emulate the targeted speech signal to be separated. Both approaches are based on the two dimensional factorization techniques that decompose the signal into two tensors that are convolved in both the temporal and spectral directions. Both approaches are applied on the convolutive mixture and the high-reverberant convolutive mixture which are more realistic than the instantaneous mixture. In this work a novel algorithm based on the nonnegative matrix factor two dimensional deconvolution (NMF2D) with adaptive sparsity has been proposed to separate the audio sources that have been mixed in an underdetermined convolutive mixture. Additionally, a novel Gamma Exponential Process has been proposed for estimating the convolutive parameters and number of components of the NMF2D/ NTF2D, and to initialize the NMF2D parameters. In addition, the effects of different window length have been investigated to determine the best fit model that suit the characteristics of the audio signal. Furthermore, a novel algorithm, namely the fusion K models of full-rank weighted nonnegative tensor factor two dimensional deconvolution (K-wNTF2D) has been proposed. The K-wNTF2D is developed for its ability in modelling both the spectral and temporal changes, and the spatial covariance matrix that addresses the high reverberation problem. Variable sparsity that derived from the Gibbs distribution is optimized under the Itakura-Saito divergence and adapted into the K-wNTF2D model. The tensors of this algorithm have been initialized by a novel initialization method, namely the SVD two-dimensional deconvolution (SVD2D). Finally, two novel informed source separation algorithms, namely, the semi-exemplar based algorithm and the exemplar-based algorithm, have been proposed. These algorithms based on the NMF2D model and the proposed two dimensional nonnegative matrix partial co-factorization (2DNMPCF) model. The idea of incorporating the exemplar is to inform the proposed separation algorithms about the targeted signal to be separated by initializing its parameters and guide the proposed separation algorithms. The adaptive sparsity is derived for both ii of the proposed algorithms. Also, a multistage of the proposed exemplar based algorithm has been proposed in order to further enhance the separation performance. Results have shown that the proposed separation algorithms are very promising, more flexible, and offer an alternative model to the conventional methods

    Unsupervised Learning for Monaural Source Separation Using Maximization–Minimization Algorithm with Time–Frequency Deconvolution

    Get PDF
    This paper presents an unsupervised learning algorithm for sparse nonnegative matrix factor time–frequency deconvolution with optimized fractional β -divergence. The β -divergence is a group of cost functions parametrized by a single parameter β . The Itakura–Saito divergence, Kullback–Leibler divergence and Least Square distance are special cases that correspond to β=0, 1, 2 , respectively. This paper presents a generalized algorithm that uses a flexible range of β that includes fractional values. It describes a maximization–minimization (MM) algorithm leading to the development of a fast convergence multiplicative update algorithm with guaranteed convergence. The proposed model operates in the time–frequency domain and decomposes an information-bearing matrix into two-dimensional deconvolution of factor matrices that represent the spectral dictionary and temporal codes. The deconvolution process has been optimized to yield sparse temporal codes through maximizing the likelihood of the observations. The paper also presents a method to estimate the fractional β value. The method is demonstrated on separating audio mixtures recorded from a single channel. The paper shows that the extraction of the spectral dictionary and temporal codes is significantly more efficient by using the proposed algorithm and subsequently leads to better source separation performance. Experimental tests and comparisons with other factorization methods have been conducted to verify its efficacy

    Blind source separation using statistical nonnegative matrix factorization

    Get PDF
    PhD ThesisBlind Source Separation (BSS) attempts to automatically extract and track a signal of interest in real world scenarios with other signals present. BSS addresses the problem of recovering the original signals from an observed mixture without relying on training knowledge. This research studied three novel approaches for solving the BSS problem based on the extensions of non-negative matrix factorization model and the sparsity regularization methods. 1) A framework of amalgamating pruning and Bayesian regularized cluster nonnegative tensor factorization with Itakura-Saito divergence for separating sources mixed in a stereo channel format: The sparse regularization term was adaptively tuned using a hierarchical Bayesian approach to yield the desired sparse decomposition. The modified Gaussian prior was formulated to express the correlation between different basis vectors. This algorithm automatically detected the optimal number of latent components of the individual source. 2) Factorization for single-channel BSS which decomposes an information-bearing matrix into complex of factor matrices that represent the spectral dictionary and temporal codes: A variational Bayesian approach was developed for computing the sparsity parameters for optimizing the matrix factorization. This approach combined the advantages of both complex matrix factorization (CMF) and variational -sparse analysis. BLIND SOURCE SEPARATION USING STATISTICAL NONNEGATIVE MATRIX FACTORIZATION ii 3) An imitated-stereo mixture model developed by weighting and time-shifting the original single-channel mixture where source signals can be modelled by the AR processes. The proposed mixing mixture is analogous to a stereo signal created by two microphones with one being real and another virtual. The imitated-stereo mixture employed the nonnegative tensor factorization for separating the observed mixture. The separability analysis of the imitated-stereo mixture was derived using Wiener masking. All algorithms were tested with real audio signals. Performance of source separation was assessed by measuring the distortion between original source and the estimated one according to the signal-to-distortion (SDR) ratio. The experimental results demonstrate that the proposed uninformed audio separation algorithms have surpassed among the conventional BSS methods; i.e. IS-cNTF, SNMF and CMF methods, with average SDR improvement in the ranges from 2.6dB to 6.4dB per source.Payap Universit

    Bayesian orthogonal component analysis for sparse representation

    Get PDF
    This paper addresses the problem of identifying a lower dimensional space where observed data can be sparsely represented. This under-complete dictionary learning task can be formulated as a blind separation problem of sparse sources linearly mixed with an unknown orthogonal mixing matrix. This issue is formulated in a Bayesian framework. First, the unknown sparse sources are modeled as Bernoulli-Gaussian processes. To promote sparsity, a weighted mixture of an atom at zero and a Gaussian distribution is proposed as prior distribution for the unobserved sources. A non-informative prior distribution defined on an appropriate Stiefel manifold is elected for the mixing matrix. The Bayesian inference on the unknown parameters is conducted using a Markov chain Monte Carlo (MCMC) method. A partially collapsed Gibbs sampler is designed to generate samples asymptotically distributed according to the joint posterior distribution of the unknown model parameters and hyperparameters. These samples are then used to approximate the joint maximum a posteriori estimator of the sources and mixing matrix. Simulations conducted on synthetic data are reported to illustrate the performance of the method for recovering sparse representations. An application to sparse coding on under-complete dictionary is finally investigated.Comment: Revised version. Accepted to IEEE Trans. Signal Processin

    Non-negative mixtures

    Get PDF
    This is the author's accepted pre-print of the article, first published as M. D. Plumbley, A. Cichocki and R. Bro. Non-negative mixtures. In P. Comon and C. Jutten (Ed), Handbook of Blind Source Separation: Independent Component Analysis and Applications. Chapter 13, pp. 515-547. Academic Press, Feb 2010. ISBN 978-0-12-374726-6 DOI: 10.1016/B978-0-12-374726-6.00018-7file: Proof:p\PlumbleyCichockiBro10-non-negative.pdf:PDF owner: markp timestamp: 2011.04.26file: Proof:p\PlumbleyCichockiBro10-non-negative.pdf:PDF owner: markp timestamp: 2011.04.2

    Low-Rank and Sparse Decomposition for Hyperspectral Image Enhancement and Clustering

    Get PDF
    In this dissertation, some new algorithms are developed for hyperspectral imaging analysis enhancement. Tensor data format is applied in hyperspectral dataset sparse and low-rank decomposition, which could enhance the classification and detection performance. And multi-view learning technique is applied in hyperspectral imaging clustering. Furthermore, kernel version of multi-view learning technique has been proposed, which could improve clustering performance. Most of low-rank and sparse decomposition algorithms are based on matrix data format for HSI analysis. As HSI contains high spectral dimensions, tensor based extended low-rank and sparse decomposition (TELRSD) is proposed in this dissertation for better performance of HSI classification with low-rank tensor part, and HSI detection with sparse tensor part. With this tensor based method, HSI is processed in 3D data format, and information between spectral bands and pixels maintain integrated during decomposition process. This proposed algorithm is compared with other state-of-art methods. And the experiment results show that TELRSD has the best performance among all those comparison algorithms. HSI clustering is an unsupervised task, which aims to group pixels into different groups without labeled information. Low-rank sparse subspace clustering (LRSSC) is the most popular algorithms for this clustering task. The spatial-spectral based multi-view low-rank sparse subspace clustering (SSMLC) algorithms is proposed in this dissertation, which extended LRSSC with multi-view learning technique. In this algorithm, spectral and spatial views are created to generate multi-view dataset of HSI, where spectral partition, morphological component analysis (MCA) and principle component analysis (PCA) are applied to create others views. Furthermore, kernel version of SSMLC (k-SSMLC) also has been investigated. The performance of SSMLC and k-SSMLC are compared with sparse subspace clustering (SSC), low-rank sparse subspace clustering (LRSSC), and spectral-spatial sparse subspace clustering (S4C). It has shown that SSMLC could improve the performance of LRSSC, and k-SSMLC has the best performance. The spectral clustering has been proved that it equivalent to non-negative matrix factorization (NMF) problem. In this case, NMF could be applied to the clustering problem. In order to include local and nonlinear features in data source, orthogonal NMF (ONMF), graph-regularized NMF (GNMF) and kernel NMF (k-NMF) has been proposed for better clustering performance. The non-linear orthogonal graph NMF combine both kernel, orthogonal and graph constraints in NMF (k-OGNMF), which push up the clustering performance further. In the HSI domain, kernel multi-view based orthogonal graph NMF (k-MOGNMF) is applied for subspace clustering, where k-OGNMF is extended with multi-view algorithm, and it has better performance and computation efficiency
    corecore