11 research outputs found
Convolutive Blind Source Separation Methods
In this chapter, we provide an overview of existing algorithms for blind source separation of convolutive audio mixtures. We provide a taxonomy, wherein many of the existing algorithms can be organized, and we present published results from those algorithms that have been applied to real-world audio separation tasks
Time-frequency processing - Spectral properties
International audienceMany audio signal processing algorithms typically do not operate on raw time-domain audio signals, but rather on time-frequency representations. A raw audio signal encodes the amplitude of a sound as a function of time. Its Fourier spectrum represents it as a function of frequency, but does not represent variations over time. A time-frequency representation presents the amplitude of a sound as a function of both time and frequency, and is able to jointly account for its temporal and spectral characteristics (Gröchenig, 2001). Time-frequency representations are appropriate for three reasons in our context. First, separation and enhancement often require modeling the structure of sound sources. Natural sound sources have a prominent structure both in time and frequency , which can be easily modeled in the time-frequency domain. Second, the sound sources are often mixed convolutively, and this convolutive mixing process can be approximated with simpler operations in the time-frequency domain. Third natural sounds are more sparsely distributed and overlap less with each other in the time-frequency domain than in the time or frequency domain, which facilitates their separation. In this chapter we introduce the most common time-frequency representations used for source separation and speech enhancement. Section 2.1 describes the procedure for calculating a time-frequency representation and converting it back to the time domain, using the short-time Fourier transform (STFT) as an example. It also presents other common time-frequency representations and their relevance for separation and enhancement. Section 2.2 discusses the properties of sound sources in the time-frequency domain, including sparsity, disjointness, and more complex structures such as harmonicity. Section 2.3 explains how to achieve separation by time-varying filtering in the time-frequency domain. We summarize the main concepts and provide links to other chapters and more advanced topics in Section 2.4
Real Time Blind Source Separation in Reverberant Environments
An online convolutive blind source separation solution has been developed for use in reverberant environments with stationary sources. Results are presented for simulation and real world data. The system achieves a separation SINR of 16.8 dB when operating on a two source mixture, with a total acoustic delay was 270 ms. This is on par with, and in many respects outperforms various published algorithms [1],[2]. A number of instantaneous blind source separation algorithms have been developed, including a block wise and recursive ICA algorithm, and a clustering based algorithm, able to obtain up to 110 dB SIR performance. The system has been realised in both Matlab and C, and is modular, allowing for easy update of the ICA algorithm that is the core of the unmixing process
Statistical single channel source separation
PhD ThesisSingle channel source separation (SCSS) principally is one of the challenging fields
in signal processing and has various significant applications. Unlike conventional
SCSS methods which were based on linear instantaneous model, this research sets out
to investigate the separation of single channel in two types of mixture which is
nonlinear instantaneous mixture and linear convolutive mixture. For the nonlinear
SCSS in instantaneous mixture, this research proposes a novel solution based on a
two-stage process that consists of a Gaussianization transform which efficiently
compensates for the nonlinear distortion follow by a maximum likelihood estimator to
perform source separation. For linear SCSS in convolutive mixture, this research
proposes new methods based on nonnegative matrix factorization which decomposes a
mixture into two-dimensional convolution factor matrices that represent the spectral
basis and temporal code. The proposed factorization considers the convolutive mixing
in the decomposition by introducing frequency constrained parameters in the model.
The method aims to separate the mixture into its constituent spectral-temporal source
components while alleviating the effect of convolutive mixing. In addition, family of
Itakura-Saito divergence has been developed as a cost function which brings the
beneficial property of scale-invariant. Two new statistical techniques are proposed,
namely, Expectation-Maximisation (EM) based algorithm framework which
maximizes the log-likelihood of a mixed signals, and the maximum a posteriori
approach which maximises the joint probability of a mixed signal using multiplicative
update rules. To further improve this research work, a novel method that incorporates
adaptive sparseness into the solution has been proposed to resolve the ambiguity and
hence, improve the algorithm performance. The theoretical foundation of the proposed
solutions has been rigorously developed and discussed in details. Results have
concretely shown the effectiveness of all the proposed algorithms presented in this
thesis in separating the mixed signals in single channel and have outperformed others
available methods.Universiti Teknikal Malaysia Melaka(UTeM),
Ministry of Higher Education of Malaysi
Signal processing techniques for extracting signals with periodic structure : applications to biomedical signals
In this dissertation some advanced methods for extracting sources from single and multichannel data are developed and utilized in biomedical applications. It is assumed that the sources of interest have periodic structure and therefore, the periodicity is exploited in various forms. The proposed methods can even be used for the cases where the signals have hidden periodicities, i.e., the periodic behaviour is not detectable from their time representation or even Fourier transform of the signal. For the case of single channel recordings a method based on singular spectrum anal ysis (SSA) of the signal is proposed. The proposed method is utilized in localizing heart sounds in respiratory signals, which is an essential pre-processing step in most of the heart sound cancellation methods. Artificially mixed and real respiratory signals are used for evaluating the method. It is shown that the performance of the proposed method is superior to those of the other methods in terms of false detection. More over, the execution time is significantly lower than that of the method ranked second in performance. For multichannel data, the problem is tackled using two approaches. First, it is assumed that the sources are periodic and the statistical characteristics of periodic sources are exploited in developing a method to effectively choose the appropriate delays in which the diagonalization takes place. In the second approach it is assumed that the sources of interest are cyclostationary. Necessary and sufficient conditions for extractability of the sources are mathematically proved and the extraction algorithms are proposed. Ballistocardiogram (BCG) artifact is considered as the sum of a number of independent cyclostationary components having the same cycle frequency. The proposed method, called cyclostationary source extraction (CSE), is able to extract these components without much destructive effect on the background electroencephalogram (EEG
Blind source separation of nonstationary convolutively mixed signals in the subband domain
The paper proposes a new technique for blind source separation (BSS) in the subband domain using an extended lapped transform (ELT) decomposition for nonstationary, convolutively mixed signals. As identified by S. Araki et al. (see Proc. 4th Int. Symp. on Independent Component Analysis and Blind Signal Separation - ICA2003, p.499-504, 2003), the motivation for subband-based BSS is the drawback of frequency domain BSS when dealing with separating mixed speech signals over a few seconds resulting in few samples in individual frequency bins leading to poor separation performance. In the proposed approach, mixed signals are decomposed into subband components by an ELT and within each subband a time domain Newton BSS algorithm is employed based on the nonstationarity property of the input signals and the joint diagonalization of output correlation matrices with time varying second order statistics (SOS). This subband version is compared to a fullband version using the same BSS algorithm