1,240 research outputs found

    The Application of Blind Source Separation to Feature Decorrelation and Normalizations

    Get PDF
    We apply a Blind Source Separation BSS algorithm to the decorrelation of Mel-warped cepstra. The observed cepstra are modeled as a convolutive mixture of independent source cepstra. The algorithm aims to minimize a cross-spectral correlation at different lags to reconstruct the source cepstra. Results show that using "independent" cepstra as features leads to a reduction in the WER.Finally, we present three different enhancements to the BSS algorithm. We also present some results of these deviations of the original algorithm

    Convolutive Blind Source Separation Methods

    Get PDF
    In this chapter, we provide an overview of existing algorithms for blind source separation of convolutive audio mixtures. We provide a taxonomy, wherein many of the existing algorithms can be organized, and we present published results from those algorithms that have been applied to real-world audio separation tasks

    Blind MultiChannel Identification and Equalization for Dereverberation and Noise Reduction based on Convolutive Transfer Function

    Get PDF
    This paper addresses the problems of blind channel identification and multichannel equalization for speech dereverberation and noise reduction. The time-domain cross-relation method is not suitable for blind room impulse response identification, due to the near-common zeros of the long impulse responses. We extend the cross-relation method to the short-time Fourier transform (STFT) domain, in which the time-domain impulse responses are approximately represented by the convolutive transfer functions (CTFs) with much less coefficients. The CTFs suffer from the common zeros caused by the oversampled STFT. We propose to identify CTFs based on the STFT with the oversampled signals and the critical sampled CTFs, which is a good compromise between the frequency aliasing of the signals and the common zeros problem of CTFs. In addition, a normalization of the CTFs is proposed to remove the gain ambiguity across sub-bands. In the STFT domain, the identified CTFs is used for multichannel equalization, in which the sparsity of speech signals is exploited. We propose to perform inverse filtering by minimizing the 1\ell_1-norm of the source signal with the relaxed 2\ell_2-norm fitting error between the micophone signals and the convolution of the estimated source signal and the CTFs used as a constraint. This method is advantageous in that the noise can be reduced by relaxing the 2\ell_2-norm to a tolerance corresponding to the noise power, and the tolerance can be automatically set. The experiments confirm the efficiency of the proposed method even under conditions with high reverberation levels and intense noise.Comment: 13 pages, 5 figures, 5 table

    Dictionary Learning of Convolved Signals

    Get PDF
    Assuming that a set of source signals is sparsely representable in a given dictionary, we show how their sparse recovery fails whenever we can only measure a convolved observation of them. Starting from this motivation, we develop a block coordinate descent method which aims to learn a convolved dictionary and provide a sparse representation of the observed signals with small residual norm. We compare the proposed approach to the K-SVD dictionary learning algorithm and show through numerical experiment on synthetic signals that, provided some conditions on the problem data, our technique converges in a fixed number of iterations to a sparse representation with smaller residual norm

    Multimodal methods for blind source separation of audio sources

    Get PDF
    The enhancement of the performance of frequency domain convolutive blind source separation (FDCBSS) techniques when applied to the problem of separating audio sources recorded in a room environment is the focus of this thesis. This challenging application is termed the cocktail party problem and the ultimate aim would be to build a machine which matches the ability of a human being to solve this task. Human beings exploit both their eyes and their ears in solving this task and hence they adopt a multimodal approach, i.e. they exploit both audio and video modalities. New multimodal methods for blind source separation of audio sources are therefore proposed in this work as a step towards realizing such a machine. The geometry of the room environment is initially exploited to improve the separation performance of a FDCBSS algorithm. The positions of the human speakers are monitored by video cameras and this information is incorporated within the FDCBSS algorithm in the form of constraints added to the underlying cross-power spectral density matrix-based cost function which measures separation performance. [Continues.

    Sampling and Recovery of Pulse Streams

    Full text link
    Compressive Sensing (CS) is a new technique for the efficient acquisition of signals, images, and other data that have a sparse representation in some basis, frame, or dictionary. By sparse we mean that the N-dimensional basis representation has just K<<N significant coefficients; in this case, the CS theory maintains that just M = K log N random linear signal measurements will both preserve all of the signal information and enable robust signal reconstruction in polynomial time. In this paper, we extend the CS theory to pulse stream data, which correspond to S-sparse signals/images that are convolved with an unknown F-sparse pulse shape. Ignoring their convolutional structure, a pulse stream signal is K=SF sparse. Such signals figure prominently in a number of applications, from neuroscience to astronomy. Our specific contributions are threefold. First, we propose a pulse stream signal model and show that it is equivalent to an infinite union of subspaces. Second, we derive a lower bound on the number of measurements M required to preserve the essential information present in pulse streams. The bound is linear in the total number of degrees of freedom S + F, which is significantly smaller than the naive bound based on the total signal sparsity K=SF. Third, we develop an efficient signal recovery algorithm that infers both the shape of the impulse response as well as the locations and amplitudes of the pulses. The algorithm alternatively estimates the pulse locations and the pulse shape in a manner reminiscent of classical deconvolution algorithms. Numerical experiments on synthetic and real data demonstrate the advantages of our approach over standard CS

    Independent vector analysis based on overlapped cliques of variable width for frequency-domain blind signal separation

    Get PDF
    A novel method is proposed to improve the performance of independent vector analysis (IVA) for blind signal separation of acoustic mixtures. IVA is a frequency-domain approach that successfully resolves the well-known permutation problem by applying a spherical dependency model to all pairs of frequency bins. The dependency model of IVA is equivalent to a single clique in an undirected graph; a clique in graph theory is defined as a subset of vertices in which any pair of vertices is connected by an undirected edge. Therefore, IVA imposes the same amount of statistical dependency on every pair of frequency bins, which may not match the characteristics of real-world signals. The proposed method allows variable amounts of statistical dependencies according to the correlation coefficients observed in real acoustic signals and, hence, enables more accurate modeling of statistical dependencies. A number of cliques constitutes the new dependency graph so that neighboring frequency bins are assigned to the same clique, while distant bins are assigned to different cliques. The permutation ambiguity is resolved by overlapped frequency bins between neighboring cliques. For speech signals, we observed especially strong correlations across neighboring frequency bins and a decrease in these correlations with an increase in the distance between bins. The clique sizes are either fixed, or determined by the reciprocal of the mel-frequency scale to impose a wider dependency on low-frequency components. Experimental results showed improved performances over conventional IVA. The signal-to-interference ratio improved from 15.5 to 18.8 dB on average for seven different source locations. When we varied the clique sizes according to the observed correlations, the stability of the proposed method increased with a large number of cliques.open4

    Source Separation for Hearing Aid Applications

    Get PDF
    corecore