169 research outputs found
Underdetermined convolutive source separation using two dimensional non-negative factorization techniques
PhD ThesisIn this thesis the underdetermined audio source separation has been considered, that is, estimating the original audio sources from the observed mixture when the number of audio sources is greater than the number of channels. The separation has been carried out using two approaches; the blind audio source separation and the informed audio source separation. The blind audio source separation approach depends on the mixture signal only and it assumes that the separation has been accomplished without any prior information (or as little as possible) about the sources. The informed audio source separation uses the exemplar in addition to the mixture signal to emulate the targeted speech signal to be separated. Both approaches are based on the two dimensional factorization techniques that decompose the signal into two tensors that are convolved in both the temporal and spectral directions. Both approaches are applied on the convolutive mixture and the high-reverberant convolutive mixture which are more realistic than the instantaneous mixture.
In this work a novel algorithm based on the nonnegative matrix factor two dimensional deconvolution (NMF2D) with adaptive sparsity has been proposed to separate the audio sources that have been mixed in an underdetermined convolutive mixture. Additionally, a novel Gamma Exponential Process has been proposed for estimating the convolutive parameters and number of components of the NMF2D/ NTF2D, and to initialize the NMF2D parameters. In addition, the effects of different window length have been investigated to determine the best fit model that suit the characteristics of the audio signal. Furthermore, a novel algorithm, namely the fusion K models of full-rank weighted nonnegative tensor factor two dimensional deconvolution (K-wNTF2D) has been proposed. The K-wNTF2D is developed for its ability in modelling both the spectral and temporal changes, and the spatial covariance matrix that addresses the high reverberation problem. Variable sparsity that derived from the Gibbs distribution is optimized under the Itakura-Saito divergence and adapted into the K-wNTF2D model. The tensors of this algorithm have been initialized by a novel initialization method, namely the SVD two-dimensional deconvolution (SVD2D). Finally, two novel informed source separation algorithms, namely, the semi-exemplar based algorithm and the exemplar-based algorithm, have been proposed. These algorithms based on the NMF2D model and the proposed two dimensional nonnegative matrix partial co-factorization (2DNMPCF) model. The idea of incorporating the exemplar is to inform the proposed separation algorithms about the targeted signal to be separated by initializing its parameters and guide the proposed separation algorithms. The adaptive sparsity is derived for both
ii
of the proposed algorithms. Also, a multistage of the proposed exemplar based algorithm has been proposed in order to further enhance the separation performance.
Results have shown that the proposed separation algorithms are very promising, more flexible, and offer an alternative model to the conventional methods
Contribution of Statistical Tests to Sparseness-Based Blind Source Separation
International audienceWe address the problem of blind source separation in the underdetermined mixture case. Two statistical tests are proposed to reduce the number of empirical parameters involved in standard sparseness-based underdetermined blind source separation (UBSS) methods. The first test performs multisource selection of the suitable time-frequency points for source recovery and is full automatic. The second one is dedicated to autosource selection for mixing matrix estimation and requires fixing two parameters only, regardless of the instrumented SNRs. We experimentally show that the use of these tests incurs no performance loss and even improves the performance of standard weak-sparseness UBSS approaches
Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation
We propose a novel deep learning model, which supports permutation invariant
training (PIT), for speaker independent multi-talker speech separation,
commonly known as the cocktail-party problem. Different from most of the prior
arts that treat speech separation as a multi-class regression problem and the
deep clustering technique that considers it a segmentation (or clustering)
problem, our model optimizes for the separation regression error, ignoring the
order of mixing sources. This strategy cleverly solves the long-lasting label
permutation problem that has prevented progress on deep learning based
techniques for speech separation. Experiments on the equal-energy mixing setup
of a Danish corpus confirms the effectiveness of PIT. We believe improvements
built upon PIT can eventually solve the cocktail-party problem and enable
real-world adoption of, e.g., automatic meeting transcription and multi-party
human-computer interaction, where overlapping speech is common.Comment: 5 page
Ecosystem Monitoring and Port Surveillance Systems
International audienceIn this project, we should build up a novel system able to perform a sustainable and long term monitoring coastal marine ecosystems and enhance port surveillance capability. The outcomes will be based on the analysis, classification and the fusion of a variety of heterogeneous data collected using different sensors (hydrophones, sonars, various camera types, etc). This manuscript introduces the identified approaches and the system structure. In addition, it focuses on developed techniques and concepts to deal with several problems related to our project. The new system will address the shortcomings of traditional approaches based on measuring environmental parameters which are expensive and fail to provide adequate large-scale monitoring. More efficient monitoring will also enable improved analysis of climate change, and provide knowledge informing the civil authority's economic relationship with its coastal marine ecosystems
Frequency Domain Independent Component Analysis Applied To Wireless Communications Over Frequency-selective Channels
In wireless communications, frequency-selective fading is a major source of impairment for wireless communications. In this research, a novel Frequency-Domain Independent Component Analysis (ICA-F) approach is proposed to blindly separate and deconvolve signals traveling through frequency-selective, slow fading channels. Compared with existing time-domain approaches, the ICA-F is computationally efficient and possesses fast convergence properties. Simulation results confirm the effectiveness of the proposed ICA-F. Orthogonal Frequency Division Multiplexing (OFDM) systems are widely used in wireless communications nowadays. However, OFDM systems are very sensitive to Carrier Frequency Offset (CFO). Thus, an accurate CFO compensation technique is required in order to achieve acceptable performance. In this dissertation, two novel blind approaches are proposed to estimate and compensate for CFO within the range of half subcarrier spacing: a Maximum Likelihood CFO Correction approach (ML-CFOC), and a high-performance, low-computation Blind CFO Estimator (BCFOE). The Bit Error Rate (BER) improvement of the ML-CFOC is achieved at the expense of a modest increase in the computational requirements without sacrificing the system bandwidth or increasing the hardware complexity. The BCFOE outperforms the existing blind CFO estimator [25, 128], referred to as the YG-CFO estimator, in terms of BER and Mean Square Error (MSE), without increasing the computational complexity, sacrificing the system bandwidth, or increasing the hardware complexity. While both proposed techniques outperform the YG-CFO estimator, the BCFOE is better than the ML-CFOC technique. Extensive simulation results illustrate the performance of the ML-CFOC and BCFOE approaches
Multimodal methods for blind source separation of audio sources
The enhancement of the performance of frequency domain convolutive
blind source separation (FDCBSS) techniques when applied to the
problem of separating audio sources recorded in a room environment
is the focus of this thesis. This challenging application is termed the
cocktail party problem and the ultimate aim would be to build a machine
which matches the ability of a human being to solve this task.
Human beings exploit both their eyes and their ears in solving this task
and hence they adopt a multimodal approach, i.e. they exploit both
audio and video modalities. New multimodal methods for blind source
separation of audio sources are therefore proposed in this work as a
step towards realizing such a machine.
The geometry of the room environment is initially exploited to improve
the separation performance of a FDCBSS algorithm. The positions
of the human speakers are monitored by video cameras and this
information is incorporated within the FDCBSS algorithm in the form
of constraints added to the underlying cross-power spectral density
matrix-based cost function which measures separation performance. [Continues.
- âŠ