26 research outputs found

    Evaluation of emerging frequency domain convolutive blind source separation algorithms based on real room recordings

    Get PDF
    This paper presents a comparative study of three of the emerging frequency domain convolutive blind source separation (FDCBSS) techniques i.e. convolutive blind separation of non-stationary sources due to Parra and Spence, penalty function-based joint diagonalization approach for convolutive blind separation of nonstationary sources due to Wang et al. and a geometrically constrained multimodal approach for convolutive blind source separation due to Sanei et al. Objective evaluation is performed on the basis of signal to interference ratio (SIR), performance index (PI) and solution to the permutation problem. The results confirm that a multimodal approach is necessary to properly mitigate the permutation in BSS and ultimately to solve the cocktail party problem. In other words, it is to make BSS semiblind by exploiting prior geometrical information, and thereby providing the framework to find robust solutions for more challenging source separation with moving speakers

    A geometrically constrained multimodal approach for convolutive blind source separation

    Get PDF
    A novel constrained multimodal approach for convolutive blind source separation is presented which incorporates video information related to geometrical position of both the speakers and the microphones, and the directionality of the speakers into the separation algorithm. The separation is performed in the frequency domain and the constraints are incorporated through a penalty function-based formulation. The separation results show a considerable improvement over traditional frequency domain convolutive BSS systems such as that developed by Parra and Spence. Importantly, the inherent permutation problem in the frequency domain BSS is potentially solve

    A geometrically constrained multimodal time domain approach for convolutive blind source separation

    Get PDF
    A novel time domain constrained multimodal approach for convolutive blind source separation is presented which incorporates geometrical 3-D cordinates of both the speakers and the microphones. The semi-blind separation is performed in time domain and the constraints are incorporated through an alternative least squares optimization. Orthogonal source model and gradient based optimization concepts have been used to construct and estimate the model parameters which fits the convolutive mixture signals. Moreover, the majorization concept has been used to incorporate the geometrical information for estimating the mixing channels for different time lags. The separation results show a considerable improvement over time domain convolutive blind source separation systems. Having diagonal or quasi diagonal covariance matrices for different source segments and also having independent profiles for different sources (which implies nonstationarity of the sources) are the requirements for our method. We evaluated the method using synthetically mixed real signals. The results show high capability of the method for separating speech signals. © 2011 EURASIP

    Multimodal methods for blind source separation of audio sources

    Get PDF
    The enhancement of the performance of frequency domain convolutive blind source separation (FDCBSS) techniques when applied to the problem of separating audio sources recorded in a room environment is the focus of this thesis. This challenging application is termed the cocktail party problem and the ultimate aim would be to build a machine which matches the ability of a human being to solve this task. Human beings exploit both their eyes and their ears in solving this task and hence they adopt a multimodal approach, i.e. they exploit both audio and video modalities. New multimodal methods for blind source separation of audio sources are therefore proposed in this work as a step towards realizing such a machine. The geometry of the room environment is initially exploited to improve the separation performance of a FDCBSS algorithm. The positions of the human speakers are monitored by video cameras and this information is incorporated within the FDCBSS algorithm in the form of constraints added to the underlying cross-power spectral density matrix-based cost function which measures separation performance. [Continues.

    Multimodal blind source separation for moving sources

    Get PDF
    A novel multimodal approach is proposed to solve the problem of blind source separation (BSS) of moving sources. The challenge of BSS for moving sources is that the mixing filters are time varying, thus the unmixing filters should also be time varying, which are difficult to track in real time. In the proposed approach, the visual modality is utilized to facilitate the separation for both stationary and moving sources. The movement of the sources is detected by a 3-D tracker based on particle filtering. The full BSS solution is formed by integrating a frequency domain blind source separation algorithm and beamforming: if the sources are identified as stationary, a frequency domain BSS algorithm is implemented with an initialization derived from the visual information. Once the sources are moving, a beamforming algorithm is used to perform real time speech enhancement and provide separation of the sources. Experimental results show that by utilizing the visual modality, the proposed algorithm can not only improve the performance of the BSS algorithm and mitigate the permutation problem for stationary sources, but also provide a good BSS performance for moving sources in a low reverberant environment

    Multimodal blind source separation for moving sources

    Get PDF
    A novel multimodal approach is proposed to solve the problem of blind source separation (BSS) of moving sources. The challenge of BSS for moving sources is that the mixing filters are time varying, thus the unmixing filters should also be time varying, which are difficult to track in real time. In the proposed approach, the visual modality is utilized to facilitate the separation for both stationary and moving sources. The movement of the sources is detected by a 3-D tracker based on particle filtering. The full BSS solution is formed by integrating a frequency domain blind source separation algorithm and beamforming: if the sources are identified as stationary, a frequency domain BSS algorithm is implemented with an initialization derived from the visual information. Once the sources are moving, a beamforming algorithm is used to perform real time speech enhancement and provide separation of the sources. Experimental results show that by utilizing the visual modality, the proposed algorithm can not only improve the performance of the BSS algorithm and mitigate the permutation problem for stationary sources, but also provide a good BSS performance for moving sources in a low reverberant environment

    A multimodal approach to blind source separation of moving sources

    Get PDF
    A novel multimodal approach is proposed to solve the problem of blind source separation (BSS) of moving sources. The challenge of BSS for moving sources is that the mixing filters are time varying; thus, the unmixing filters should also be time varying, which are difficult to calculate in real time. In the proposed approach, the visual modality is utilized to facilitate the separation for both stationary and moving sources. The movement of the sources is detected by a 3-D tracker based on video cameras. Positions and velocities of the sources are obtained from the 3-D tracker based on a Markov Chain Monte Carlo particle filter (MCMC-PF), which results in high sampling efficiency. The full BSS solution is formed by integrating a frequency domain blind source separation algorithm and beamforming: if the sources are identified as stationary for a certain minimum period, a frequency domain BSS algorithm is implemented with an initialization derived from the positions of the source signals. Once the sources are moving, a beamforming algorithm which requires no prior statistical knowledge is used to perform real time speech enhancement and provide separation of the sources. Experimental results confirm that by utilizing the visual modality, the proposed algorithm not only improves the performance of the BSS algorithm and mitigates the permutation problem for stationary sources, but also provides a good BSS performance for moving sources in a low reverberant environment

    Single-channel source separation using non-negative matrix factorization

    Get PDF

    Enhanced independent vector analysis for audio separation in a room environment

    Get PDF
    Independent vector analysis (IVA) is studied as a frequency domain blind source separation method, which can theoretically avoid the permutation problem by retaining the dependency between different frequency bins of the same source vector while removing the dependency between different source vectors. This thesis focuses upon improving the performance of independent vector analysis when it is used to solve the audio separation problem in a room environment. A specific stability problem of IVA, i.e. the block permutation problem, is identified and analyzed. Then a robust IVA method is proposed to solve this problem by exploiting the phase continuity of the unmixing matrix. Moreover, an auxiliary function based IVA algorithm with an overlapped chain type source prior is proposed as well to mitigate this problem. Then an informed IVA scheme is proposed which combines the geometric information of the sources from video to solve the problem by providing an intelligent initialization for optimal convergence. The proposed informed IVA algorithm can also achieve a faster convergence in terms of iteration numbers and better separation performance. A pitch based evaluation method is defined to judge the separation performance objectively when the information describing the mixing matrix and sources is missing. In order to improve the separation performance of IVA, an appropriate multivariate source prior is needed to better preserve the dependency structure within the source vectors. A particular multivariate generalized Gaussian distribution is adopted as the source prior. The nonlinear score function derived from this proposed source prior contains the fourth order relationships between different frequency bins, which provides a more informative and stronger dependency structure compared with the original IVA algorithm and thereby improves the separation performance. Copula theory is a central tool to model the nonlinear dependency structure. The t copula is proposed to describe the dependency structure within the frequency domain speech signals due to its tail dependency property, which means if one variable has an extreme value, other variables are expected to have extreme values. A multivariate student's t distribution constructed by using a t copula with the univariate student's t marginal distribution is proposed as the source prior. Then the IVA algorithm with the proposed source prior is derived. The proposed algorithms are tested with real speech signals in different reverberant room environments both using modelled room impulse response and real room recordings. State-of-the-art criteria are used to evaluate the separation performance, and the experimental results confirm the advantage of the proposed algorithms

    Efficient Multiband Algorithms for Blind Source Separation

    Get PDF
    The problem of blind separation refers to recovering original signals, called source signals, from the mixed signals, called observation signals, in a reverberant environment. The mixture is a function of a sequence of original speech signals mixed in a reverberant room. The objective is to separate mixed signals to obtain the original signals without degradation and without prior information of the features of the sources. The strategy used to achieve this objective is to use multiple bands that work at a lower rate, have less computational cost and a quicker convergence than the conventional scheme. Our motivation is the competitive results of unequal-passbands scheme applications, in terms of the convergence speed. The objective of this research is to improve unequal-passbands schemes by improving the speed of convergence and reducing the computational cost. The first proposed work is a novel maximally decimated unequal-passbands scheme.This scheme uses multiple bands that make it work at a reduced sampling rate, and low computational cost. An adaptation approach is derived with an adaptation step that improved the convergence speed. The performance of the proposed scheme was measured in different ways. First, the mean square errors of various bands are measured and the results are compared to a maximally decimated equal-passbands scheme, which is currently the best performing method. The results show that the proposed scheme has a faster convergence rate than the maximally decimated equal-passbands scheme. Second, when the scheme is tested for white and coloured inputs using a low number of bands, it does not yield good results; but when the number of bands is increased, the speed of convergence is enhanced. Third, the scheme is tested for quick changes. It is shown that the performance of the proposed scheme is similar to that of the equal-passbands scheme. Fourth, the scheme is also tested in a stationary state. The experimental results confirm the theoretical work. For more challenging scenarios, an unequal-passbands scheme with over-sampled decimation is proposed; the greater number of bands, the more efficient the separation. The results are compared to the currently best performing method. Second, an experimental comparison is made between the proposed multiband scheme and the conventional scheme. The results show that the convergence speed and the signal-to-interference ratio of the proposed scheme are higher than that of the conventional scheme, and the computation cost is lower than that of the conventional scheme
    corecore