34 research outputs found

    Real Time Blind Source Separation in Reverberant Environments

    No full text
    An online convolutive blind source separation solution has been developed for use in reverberant environments with stationary sources. Results are presented for simulation and real world data. The system achieves a separation SINR of 16.8 dB when operating on a two source mixture, with a total acoustic delay was 270 ms. This is on par with, and in many respects outperforms various published algorithms [1],[2]. A number of instantaneous blind source separation algorithms have been developed, including a block wise and recursive ICA algorithm, and a clustering based algorithm, able to obtain up to 110 dB SIR performance. The system has been realised in both Matlab and C, and is modular, allowing for easy update of the ICA algorithm that is the core of the unmixing process

    Object-based Modeling of Audio for Coding and Source Separation

    Get PDF
    This thesis studies several data decomposition algorithms for obtaining an object-based representation of an audio signal. The estimation of the representation parameters are coupled with audio-specific criteria, such as the spectral redundancy, sparsity, perceptual relevance and spatial position of sounds. The objective is to obtain an audio signal representation that is composed of meaningful entities called audio objects that reflect the properties of real-world sound objects and events. The estimation of the object-based model is based on magnitude spectrogram redundancy using non-negative matrix factorization with extensions to multichannel and complex-valued data. The benefits of working with object-based audio representations over the conventional time-frequency bin-wise processing are studied. The two main applications of the object-based audio representations proposed in this thesis are spatial audio coding and sound source separation from multichannel microphone array recordings. In the proposed spatial audio coding algorithm, the audio objects are estimated from the multichannel magnitude spectrogram. The audio objects are used for recovering the content of each original channel from a single downmixed signal, using time-frequency filtering. The perceptual relevance of modeling the audio signal is considered in the estimation of the parameters of the object-based model, and the sparsity of the model is utilized in encoding its parameters. Additionally, a quantization of the model parameters is proposed that reflects the perceptual relevance of each quantized element. The proposed object-based spatial audio coding algorithm is evaluated via listening tests and comparing the overall perceptual quality to conventional time-frequency block-wise methods at the same bitrates. The proposed approach is found to produce comparable coding efficiency while providing additional functionality via the object-based coding domain representation, such as the blind separation of the mixture of sound sources in the encoded channels. For the sound source separation from multichannel audio recorded by a microphone array, a method combining an object-based magnitude model and spatial covariance matrix estimation is considered. A direction of arrival-based model for the spatial covariance matrices of the sound sources is proposed. Unlike the conventional approaches, the estimation of the parameters of the proposed spatial covariance matrix model ensures a spatially coherent solution for the spatial parameterization of the sound sources. The separation quality is measured with objective criteria and the proposed method is shown to improve over the state-of-the-art sound source separation methods, with recordings done using a small microphone array

    Single-Channel Signal Separation Using Spectral Basis Correlation with Sparse Nonnegative Tensor Factorization

    Get PDF
    A novel approach for solving the single-channel signal separation is presented the proposed sparse nonnegative tensor factorization under the framework of maximum a posteriori probability and adaptively fine-tuned using the hierarchical Bayesian approach with a new mixing mixture model. The mixing mixture is an analogy of a stereo signal concept given by one real and the other virtual microphones. An “imitated-stereo” mixture model is thus developed by weighting and time-shifting the original single-channel mixture. This leads to an artificial mixing system of dual channels which gives rise to a new form of spectral basis correlation diversity of the sources. Underlying all factorization algorithms is the principal difficulty in estimating the adequate number of latent components for each signal. This paper addresses these issues by developing a framework for pruning unnecessary components and incorporating a modified multivariate rectified Gaussian prior information into the spectral basis features. The parameters of the imitated-stereo model are estimated via the proposed sparse nonnegative tensor factorization with Itakura–Saito divergence. In addition, the separability conditions of the proposed mixture model are derived and demonstrated that the proposed method can separate real-time captured mixtures. Experimental testing on real audio sources has been conducted to verify the capability of the proposed method

    Flexible methods for blind separation of complex signals

    Get PDF
    One of the main matter in Blind Source Separation (BSS) performed with a neural network approach is the choice of the nonlinear activation function (AF). In fact if the shape of the activation function is chosen as the cumulative density function (c.d.f.) of the original source the problem is solved. For this scope in this thesis a flexible approach is introduced and the shape of the activation functions is changed during the learning process using the so-called “spline functions”. The problem is complicated in the case of separation of complex sources where there is the problem of the dichotomy between analyticity and boundedness of the complex activation functions. The problem is solved introducing the “splitting function” model as activation function. The “splitting function” is a couple of “spline function” which wind off the real and the imaginary part of the complex activation function, each of one depending from the real and imaginary variable. A more realistic model is the “generalized splitting function”, which is formed by a couple of two bi-dimensional functions (surfaces), one for the real and one for the imaginary part of the complex function, each depending by both the real and imaginary part of the complex variable. Unfortunately the linear environment is unrealistic in many practical applications. In this way there is the need of extending BSS problem in the nonlinear environment: in this case both the activation function than the nonlinear distorting function are realized by the “splitting function” made of “spline function”. The complex and instantaneous separation in linear and nonlinear environment allow us to perform a complex-valued extension of the well-known INFOMAX algorithm in several practical situations, such as convolutive mixtures, fMRI signal analysis and bandpass signal transmission. In addition advanced characteristics on the proposed approach are introduced and deeply described. First of all it is shows as splines are universal nonlinear functions for BSS problem: they are able to perform separation in anyway. Then it is analyzed as the “splitting solution” allows the algorithm to obtain a phase recovery: usually there is a phase ambiguity. Finally a Cramér-Rao lower bound for ICA is discussed. Several experimental results, tested by different objective indexes, show the effectiveness of the proposed approaches

    Flexible methods for blind separation of complex signals

    Get PDF
    One of the main matter in Blind Source Separation (BSS) performed with a neural network approach is the choice of the nonlinear activation function (AF). In fact if the shape of the activation function is chosen as the cumulative density function (c.d.f.) of the original source the problem is solved. For this scope in this thesis a flexible approach is introduced and the shape of the activation functions is changed during the learning process using the so-called “spline functions”. The problem is complicated in the case of separation of complex sources where there is the problem of the dichotomy between analyticity and boundedness of the complex activation functions. The problem is solved introducing the “splitting function” model as activation function. The “splitting function” is a couple of “spline function” which wind off the real and the imaginary part of the complex activation function, each of one depending from the real and imaginary variable. A more realistic model is the “generalized splitting function”, which is formed by a couple of two bi-dimensional functions (surfaces), one for the real and one for the imaginary part of the complex function, each depending by both the real and imaginary part of the complex variable. Unfortunately the linear environment is unrealistic in many practical applications. In this way there is the need of extending BSS problem in the nonlinear environment: in this case both the activation function than the nonlinear distorting function are realized by the “splitting function” made of “spline function”. The complex and instantaneous separation in linear and nonlinear environment allow us to perform a complex-valued extension of the well-known INFOMAX algorithm in several practical situations, such as convolutive mixtures, fMRI signal analysis and bandpass signal transmission. In addition advanced characteristics on the proposed approach are introduced and deeply described. First of all it is shows as splines are universal nonlinear functions for BSS problem: they are able to perform separation in anyway. Then it is analyzed as the “splitting solution” allows the algorithm to obtain a phase recovery: usually there is a phase ambiguity. Finally a Cramér-Rao lower bound for ICA is discussed. Several experimental results, tested by different objective indexes, show the effectiveness of the proposed approaches

    Acoustic event detection and localization using distributed microphone arrays

    Get PDF
    Automatic acoustic scene analysis is a complex task that involves several functionalities: detection (time), localization (space), separation, recognition, etc. This thesis focuses on both acoustic event detection (AED) and acoustic source localization (ASL), when several sources may be simultaneously present in a room. In particular, the experimentation work is carried out with a meeting-room scenario. Unlike previous works that either employed models of all possible sound combinations or additionally used video signals, in this thesis, the time overlapping sound problem is tackled by exploiting the signal diversity that results from the usage of multiple microphone array beamformers. The core of this thesis work is a rather computationally efficient approach that consists of three processing stages. In the first, a set of (null) steering beamformers is used to carry out diverse partial signal separations, by using multiple arbitrarily located linear microphone arrays, each of them composed of a small number of microphones. In the second stage, each of the beamformer output goes through a classification step, which uses models for all the targeted sound classes (HMM-GMM, in the experiments). Then, in a third stage, the classifier scores, either being intra- or inter-array, are combined using a probabilistic criterion (like MAP) or a machine learning fusion technique (fuzzy integral (FI), in the experiments). The above-mentioned processing scheme is applied in this thesis to a set of complexity-increasing problems, which are defined by the assumptions made regarding identities (plus time endpoints) and/or positions of sounds. In fact, the thesis report starts with the problem of unambiguously mapping the identities to the positions, continues with AED (positions assumed) and ASL (identities assumed), and ends with the integration of AED and ASL in a single system, which does not need any assumption about identities or positions. The evaluation experiments are carried out in a meeting-room scenario, where two sources are temporally overlapped; one of them is always speech and the other is an acoustic event from a pre-defined set. Two different databases are used, one that is produced by merging signals actually recorded in the UPCÂżs department smart-room, and the other consists of overlapping sound signals directly recorded in the same room and in a rather spontaneous way. From the experimental results with a single array, it can be observed that the proposed detection system performs better than either the model based system or a blind source separation based system. Moreover, the product rule based combination and the FI based fusion of the scores resulting from the multiple arrays improve the accuracies further. On the other hand, the posterior position assignment is performed with a very small error rate. Regarding ASL and assuming an accurate AED system output, the 1-source localization performance of the proposed system is slightly better than that of the widely-used SRP-PHAT system, working in an event-based mode, and it even performs significantly better than the latter one in the more complex 2-source scenario. Finally, though the joint system suffers from a slight degradation in terms of classification accuracy with respect to the case where the source positions are known, it shows the advantage of carrying out the two tasks, recognition and localization, with a single system, and it allows the inclusion of information about the prior probabilities of the source positions. It is worth noticing also that, although the acoustic scenario used for experimentation is rather limited, the approach and its formalism were developed for a general case, where the number and identities of sources are not constrained
    corecore