169 research outputs found

    Underdetermined convolutive source separation using two dimensional non-negative factorization techniques

    Get PDF
    PhD ThesisIn this thesis the underdetermined audio source separation has been considered, that is, estimating the original audio sources from the observed mixture when the number of audio sources is greater than the number of channels. The separation has been carried out using two approaches; the blind audio source separation and the informed audio source separation. The blind audio source separation approach depends on the mixture signal only and it assumes that the separation has been accomplished without any prior information (or as little as possible) about the sources. The informed audio source separation uses the exemplar in addition to the mixture signal to emulate the targeted speech signal to be separated. Both approaches are based on the two dimensional factorization techniques that decompose the signal into two tensors that are convolved in both the temporal and spectral directions. Both approaches are applied on the convolutive mixture and the high-reverberant convolutive mixture which are more realistic than the instantaneous mixture. In this work a novel algorithm based on the nonnegative matrix factor two dimensional deconvolution (NMF2D) with adaptive sparsity has been proposed to separate the audio sources that have been mixed in an underdetermined convolutive mixture. Additionally, a novel Gamma Exponential Process has been proposed for estimating the convolutive parameters and number of components of the NMF2D/ NTF2D, and to initialize the NMF2D parameters. In addition, the effects of different window length have been investigated to determine the best fit model that suit the characteristics of the audio signal. Furthermore, a novel algorithm, namely the fusion K models of full-rank weighted nonnegative tensor factor two dimensional deconvolution (K-wNTF2D) has been proposed. The K-wNTF2D is developed for its ability in modelling both the spectral and temporal changes, and the spatial covariance matrix that addresses the high reverberation problem. Variable sparsity that derived from the Gibbs distribution is optimized under the Itakura-Saito divergence and adapted into the K-wNTF2D model. The tensors of this algorithm have been initialized by a novel initialization method, namely the SVD two-dimensional deconvolution (SVD2D). Finally, two novel informed source separation algorithms, namely, the semi-exemplar based algorithm and the exemplar-based algorithm, have been proposed. These algorithms based on the NMF2D model and the proposed two dimensional nonnegative matrix partial co-factorization (2DNMPCF) model. The idea of incorporating the exemplar is to inform the proposed separation algorithms about the targeted signal to be separated by initializing its parameters and guide the proposed separation algorithms. The adaptive sparsity is derived for both ii of the proposed algorithms. Also, a multistage of the proposed exemplar based algorithm has been proposed in order to further enhance the separation performance. Results have shown that the proposed separation algorithms are very promising, more flexible, and offer an alternative model to the conventional methods

    Blind source separation the effects of signal non-stationarity

    Get PDF

    Contribution of Statistical Tests to Sparseness-Based Blind Source Separation

    Get PDF
    International audienceWe address the problem of blind source separation in the underdetermined mixture case. Two statistical tests are proposed to reduce the number of empirical parameters involved in standard sparseness-based underdetermined blind source separation (UBSS) methods. The first test performs multisource selection of the suitable time-frequency points for source recovery and is full automatic. The second one is dedicated to autosource selection for mixing matrix estimation and requires fixing two parameters only, regardless of the instrumented SNRs. We experimentally show that the use of these tests incurs no performance loss and even improves the performance of standard weak-sparseness UBSS approaches

    Single-channel source separation using non-negative matrix factorization

    Get PDF

    Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation

    Full text link
    We propose a novel deep learning model, which supports permutation invariant training (PIT), for speaker independent multi-talker speech separation, commonly known as the cocktail-party problem. Different from most of the prior arts that treat speech separation as a multi-class regression problem and the deep clustering technique that considers it a segmentation (or clustering) problem, our model optimizes for the separation regression error, ignoring the order of mixing sources. This strategy cleverly solves the long-lasting label permutation problem that has prevented progress on deep learning based techniques for speech separation. Experiments on the equal-energy mixing setup of a Danish corpus confirms the effectiveness of PIT. We believe improvements built upon PIT can eventually solve the cocktail-party problem and enable real-world adoption of, e.g., automatic meeting transcription and multi-party human-computer interaction, where overlapping speech is common.Comment: 5 page

    Ecosystem Monitoring and Port Surveillance Systems

    No full text
    International audienceIn this project, we should build up a novel system able to perform a sustainable and long term monitoring coastal marine ecosystems and enhance port surveillance capability. The outcomes will be based on the analysis, classification and the fusion of a variety of heterogeneous data collected using different sensors (hydrophones, sonars, various camera types, etc). This manuscript introduces the identified approaches and the system structure. In addition, it focuses on developed techniques and concepts to deal with several problems related to our project. The new system will address the shortcomings of traditional approaches based on measuring environmental parameters which are expensive and fail to provide adequate large-scale monitoring. More efficient monitoring will also enable improved analysis of climate change, and provide knowledge informing the civil authority's economic relationship with its coastal marine ecosystems

    Frequency Domain Independent Component Analysis Applied To Wireless Communications Over Frequency-selective Channels

    Get PDF
    In wireless communications, frequency-selective fading is a major source of impairment for wireless communications. In this research, a novel Frequency-Domain Independent Component Analysis (ICA-F) approach is proposed to blindly separate and deconvolve signals traveling through frequency-selective, slow fading channels. Compared with existing time-domain approaches, the ICA-F is computationally efficient and possesses fast convergence properties. Simulation results confirm the effectiveness of the proposed ICA-F. Orthogonal Frequency Division Multiplexing (OFDM) systems are widely used in wireless communications nowadays. However, OFDM systems are very sensitive to Carrier Frequency Offset (CFO). Thus, an accurate CFO compensation technique is required in order to achieve acceptable performance. In this dissertation, two novel blind approaches are proposed to estimate and compensate for CFO within the range of half subcarrier spacing: a Maximum Likelihood CFO Correction approach (ML-CFOC), and a high-performance, low-computation Blind CFO Estimator (BCFOE). The Bit Error Rate (BER) improvement of the ML-CFOC is achieved at the expense of a modest increase in the computational requirements without sacrificing the system bandwidth or increasing the hardware complexity. The BCFOE outperforms the existing blind CFO estimator [25, 128], referred to as the YG-CFO estimator, in terms of BER and Mean Square Error (MSE), without increasing the computational complexity, sacrificing the system bandwidth, or increasing the hardware complexity. While both proposed techniques outperform the YG-CFO estimator, the BCFOE is better than the ML-CFOC technique. Extensive simulation results illustrate the performance of the ML-CFOC and BCFOE approaches

    Multimodal methods for blind source separation of audio sources

    Get PDF
    The enhancement of the performance of frequency domain convolutive blind source separation (FDCBSS) techniques when applied to the problem of separating audio sources recorded in a room environment is the focus of this thesis. This challenging application is termed the cocktail party problem and the ultimate aim would be to build a machine which matches the ability of a human being to solve this task. Human beings exploit both their eyes and their ears in solving this task and hence they adopt a multimodal approach, i.e. they exploit both audio and video modalities. New multimodal methods for blind source separation of audio sources are therefore proposed in this work as a step towards realizing such a machine. The geometry of the room environment is initially exploited to improve the separation performance of a FDCBSS algorithm. The positions of the human speakers are monitored by video cameras and this information is incorporated within the FDCBSS algorithm in the form of constraints added to the underlying cross-power spectral density matrix-based cost function which measures separation performance. [Continues.
    • 

    corecore