44 research outputs found

    Efficient Multiband Algorithms for Blind Source Separation

    Get PDF
    The problem of blind separation refers to recovering original signals, called source signals, from the mixed signals, called observation signals, in a reverberant environment. The mixture is a function of a sequence of original speech signals mixed in a reverberant room. The objective is to separate mixed signals to obtain the original signals without degradation and without prior information of the features of the sources. The strategy used to achieve this objective is to use multiple bands that work at a lower rate, have less computational cost and a quicker convergence than the conventional scheme. Our motivation is the competitive results of unequal-passbands scheme applications, in terms of the convergence speed. The objective of this research is to improve unequal-passbands schemes by improving the speed of convergence and reducing the computational cost. The first proposed work is a novel maximally decimated unequal-passbands scheme.This scheme uses multiple bands that make it work at a reduced sampling rate, and low computational cost. An adaptation approach is derived with an adaptation step that improved the convergence speed. The performance of the proposed scheme was measured in different ways. First, the mean square errors of various bands are measured and the results are compared to a maximally decimated equal-passbands scheme, which is currently the best performing method. The results show that the proposed scheme has a faster convergence rate than the maximally decimated equal-passbands scheme. Second, when the scheme is tested for white and coloured inputs using a low number of bands, it does not yield good results; but when the number of bands is increased, the speed of convergence is enhanced. Third, the scheme is tested for quick changes. It is shown that the performance of the proposed scheme is similar to that of the equal-passbands scheme. Fourth, the scheme is also tested in a stationary state. The experimental results confirm the theoretical work. For more challenging scenarios, an unequal-passbands scheme with over-sampled decimation is proposed; the greater number of bands, the more efficient the separation. The results are compared to the currently best performing method. Second, an experimental comparison is made between the proposed multiband scheme and the conventional scheme. The results show that the convergence speed and the signal-to-interference ratio of the proposed scheme are higher than that of the conventional scheme, and the computation cost is lower than that of the conventional scheme

    A Theory of Cramer-Rao Bounds for Constrained Parametric Models

    Get PDF
    A simple expression for the Cram'er-Rao bound (CRB) is presented for the scenario of estimating parameters θ\theta that are required to satisfy a differentiable constraint function f(θ)f(\theta). A proof of this constrained CRB (CCRB) is provided using the implicit function theorem, and the encompassing theory of the CCRB is proven in a similar manner. This theory includes connecting the CCRB to notions of identifiability of constrained parameters; the linear model under a linear constraint; the constrained maximum likelihood problem, it's asymptotic properties and the method of scoring with constraints; and hypothesis testing. The value of the tools developed in this theory are then presented in the communications context for the convolutive mixture model and the calibrated array model

    Application of sound source separation methods to advanced spatial audio systems

    Full text link
    This thesis is related to the field of Sound Source Separation (SSS). It addresses the development and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in twochannel stereo format, special up-converters are required to use advanced spatial audio reproduction formats, such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is required. Source separation problems in digital signal processing are those in which several signals have been mixed together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied to existing two-channel mixtures to extract the different objects that compose the stereo scene. Unfortunately, most stereo mixtures are underdetermined, i.e., there are more sound sources than audio channels. This condition makes the SSS problem especially difficult and stronger assumptions have to be taken, often related to the sparsity of the sources under some signal transformation. This thesis is focused on the application of SSS techniques to the spatial sound reproduction field. As a result, its contributions can be categorized within these two areas. First, two underdetermined SSS methods are proposed to deal efficiently with the separation of stereo sound mixtures. These techniques are based on a multi-level thresholding segmentation approach, which enables to perform a fast and unsupervised separation of sound sources in the time-frequency domain. Although both techniques rely on the same clustering type, the features considered by each of them are related to different localization cues that enable to perform separation of either instantaneous or real mixtures.Additionally, two post-processing techniques aimed at improving the isolation of the separated sources are proposed. The performance achieved by several SSS methods in the resynthesis of WFS sound scenes is afterwards evaluated by means of listening tests, paying special attention to the change observed in the perceived spatial attributes. Although the estimated sources are distorted versions of the original ones, the masking effects involved in their spatial remixing make artifacts less perceptible, which improves the overall assessed quality. Finally, some novel developments related to the application of time-frequency processing to source localization and enhanced sound reproduction are presented.Cobos Serrano, M. (2009). Application of sound source separation methods to advanced spatial audio systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8969Palanci

    Flexible methods for blind separation of complex signals

    Get PDF
    One of the main matter in Blind Source Separation (BSS) performed with a neural network approach is the choice of the nonlinear activation function (AF). In fact if the shape of the activation function is chosen as the cumulative density function (c.d.f.) of the original source the problem is solved. For this scope in this thesis a flexible approach is introduced and the shape of the activation functions is changed during the learning process using the so-called “spline functions”. The problem is complicated in the case of separation of complex sources where there is the problem of the dichotomy between analyticity and boundedness of the complex activation functions. The problem is solved introducing the “splitting function” model as activation function. The “splitting function” is a couple of “spline function” which wind off the real and the imaginary part of the complex activation function, each of one depending from the real and imaginary variable. A more realistic model is the “generalized splitting function”, which is formed by a couple of two bi-dimensional functions (surfaces), one for the real and one for the imaginary part of the complex function, each depending by both the real and imaginary part of the complex variable. Unfortunately the linear environment is unrealistic in many practical applications. In this way there is the need of extending BSS problem in the nonlinear environment: in this case both the activation function than the nonlinear distorting function are realized by the “splitting function” made of “spline function”. The complex and instantaneous separation in linear and nonlinear environment allow us to perform a complex-valued extension of the well-known INFOMAX algorithm in several practical situations, such as convolutive mixtures, fMRI signal analysis and bandpass signal transmission. In addition advanced characteristics on the proposed approach are introduced and deeply described. First of all it is shows as splines are universal nonlinear functions for BSS problem: they are able to perform separation in anyway. Then it is analyzed as the “splitting solution” allows the algorithm to obtain a phase recovery: usually there is a phase ambiguity. Finally a Cramér-Rao lower bound for ICA is discussed. Several experimental results, tested by different objective indexes, show the effectiveness of the proposed approaches

    Flexible methods for blind separation of complex signals

    Get PDF
    One of the main matter in Blind Source Separation (BSS) performed with a neural network approach is the choice of the nonlinear activation function (AF). In fact if the shape of the activation function is chosen as the cumulative density function (c.d.f.) of the original source the problem is solved. For this scope in this thesis a flexible approach is introduced and the shape of the activation functions is changed during the learning process using the so-called “spline functions”. The problem is complicated in the case of separation of complex sources where there is the problem of the dichotomy between analyticity and boundedness of the complex activation functions. The problem is solved introducing the “splitting function” model as activation function. The “splitting function” is a couple of “spline function” which wind off the real and the imaginary part of the complex activation function, each of one depending from the real and imaginary variable. A more realistic model is the “generalized splitting function”, which is formed by a couple of two bi-dimensional functions (surfaces), one for the real and one for the imaginary part of the complex function, each depending by both the real and imaginary part of the complex variable. Unfortunately the linear environment is unrealistic in many practical applications. In this way there is the need of extending BSS problem in the nonlinear environment: in this case both the activation function than the nonlinear distorting function are realized by the “splitting function” made of “spline function”. The complex and instantaneous separation in linear and nonlinear environment allow us to perform a complex-valued extension of the well-known INFOMAX algorithm in several practical situations, such as convolutive mixtures, fMRI signal analysis and bandpass signal transmission. In addition advanced characteristics on the proposed approach are introduced and deeply described. First of all it is shows as splines are universal nonlinear functions for BSS problem: they are able to perform separation in anyway. Then it is analyzed as the “splitting solution” allows the algorithm to obtain a phase recovery: usually there is a phase ambiguity. Finally a Cramér-Rao lower bound for ICA is discussed. Several experimental results, tested by different objective indexes, show the effectiveness of the proposed approaches

    Acoustic event detection and localization using distributed microphone arrays

    Get PDF
    Automatic acoustic scene analysis is a complex task that involves several functionalities: detection (time), localization (space), separation, recognition, etc. This thesis focuses on both acoustic event detection (AED) and acoustic source localization (ASL), when several sources may be simultaneously present in a room. In particular, the experimentation work is carried out with a meeting-room scenario. Unlike previous works that either employed models of all possible sound combinations or additionally used video signals, in this thesis, the time overlapping sound problem is tackled by exploiting the signal diversity that results from the usage of multiple microphone array beamformers. The core of this thesis work is a rather computationally efficient approach that consists of three processing stages. In the first, a set of (null) steering beamformers is used to carry out diverse partial signal separations, by using multiple arbitrarily located linear microphone arrays, each of them composed of a small number of microphones. In the second stage, each of the beamformer output goes through a classification step, which uses models for all the targeted sound classes (HMM-GMM, in the experiments). Then, in a third stage, the classifier scores, either being intra- or inter-array, are combined using a probabilistic criterion (like MAP) or a machine learning fusion technique (fuzzy integral (FI), in the experiments). The above-mentioned processing scheme is applied in this thesis to a set of complexity-increasing problems, which are defined by the assumptions made regarding identities (plus time endpoints) and/or positions of sounds. In fact, the thesis report starts with the problem of unambiguously mapping the identities to the positions, continues with AED (positions assumed) and ASL (identities assumed), and ends with the integration of AED and ASL in a single system, which does not need any assumption about identities or positions. The evaluation experiments are carried out in a meeting-room scenario, where two sources are temporally overlapped; one of them is always speech and the other is an acoustic event from a pre-defined set. Two different databases are used, one that is produced by merging signals actually recorded in the UPCÂżs department smart-room, and the other consists of overlapping sound signals directly recorded in the same room and in a rather spontaneous way. From the experimental results with a single array, it can be observed that the proposed detection system performs better than either the model based system or a blind source separation based system. Moreover, the product rule based combination and the FI based fusion of the scores resulting from the multiple arrays improve the accuracies further. On the other hand, the posterior position assignment is performed with a very small error rate. Regarding ASL and assuming an accurate AED system output, the 1-source localization performance of the proposed system is slightly better than that of the widely-used SRP-PHAT system, working in an event-based mode, and it even performs significantly better than the latter one in the more complex 2-source scenario. Finally, though the joint system suffers from a slight degradation in terms of classification accuracy with respect to the case where the source positions are known, it shows the advantage of carrying out the two tasks, recognition and localization, with a single system, and it allows the inclusion of information about the prior probabilities of the source positions. It is worth noticing also that, although the acoustic scenario used for experimentation is rather limited, the approach and its formalism were developed for a general case, where the number and identities of sources are not constrained

    Unsupervised neural spike identification for large-scale, high-density micro-electrode arrays

    Get PDF
    This work deals with the development and evaluation of algorithms that extract sequences of single neuron action potentials from extracellular recordings of superimposed neural activity - a task commonly referred to as spike sorting. Large (>103>10^3 electrodes) and dense (subcellular spatial sampling) CMOS-based micro-electrode-arrays allow to record from hundreds of neurons simultaneously. State of the art algorithms for up to a few hundred sensors are not directly applicable to this type of data. Promising modern spike sorting algorithms that seek the statistically optimal solution or focus on real-time capabilities need to be initialized with a preceding sorting. Therefore, this work focused on unsupervised solutions, in order to learn the number of neurons and their spike trains with proper resolution of both temporally and spatiotemporally overlapping activity from the extracellular data alone. Chapter (1) informs about the nature of the data, a model based view and how this relates to spike sorting in order to understand the design decisions of this thesis. The main materials and methods chapter (2) bundles the infrastructural work that is independent of but mandatory for the development and evaluation of any spike sorting method. The main problem was split in two parts. Chapter (3) assesses the problem of analyzing data from thousands of densely integrated channels in a divide-and-conquer fashion. Making use of the spatial information of dense 2D arrays, regions of interest (ROIs) with boundaries adapted to the electrical image of single or multiple neurons were automatically constructed. All ROIs could then be processed in parallel. Within each region of interest the maximum number of neurons could be estimated from the local data matrix alone. An independent component analysis (ICA) based sorting was used to identify units within ROIs. This stage can be replaced by another suitable spike sorting algorithm to solve the local problem. Redundantly identified units across different ROIs were automatically fused into a global solution. The framework was evaluated on both real as well as simulated recordings with ground truth. For the latter it was shown that a major fraction of units could be extracted without any error. The high-dimensional data can be visualized after automatic sorting for convenient verification. Means of rapidly separating well from poorly isolated neurons were proposed and evaluated. Chapter (4) presents a more sophisticated algorithm that was developed to solve the local problem of densely arranged sensors. ICA assumes the data to be instantaneously mixed, thereby reducing spatial redundancy only and ignoring the temporal structure of extracellular data. The widely accepted generative model describes the intracellular spike trains to be convolved with their extracellular spatiotemporal kernels. To account for the latter it was assessed thoroughly whether convolutive ICA (cICA) could increase sorting performance over instantaneous ICA. The high computational complexity of cICA was dealt with by automatically identifying relevant subspaces that can be unmixed in parallel. Although convolutive ICA is suggested by the data model, the sorting results were dominated by the post-processing for realistic scenarios and did not outperform ICA based sorting. Potential alternatives are discussed thoroughly and bounded from above by a supervised sorting. This work provides a completely unsupervised spike sorting solution that enables the extraction of a major fraction of neurons with high accuracy and thereby helps to overcome current limitations of analyzing the high-dimensional datasets obtained from simultaneously imaging the extracellular activity from hundreds of neurons with thousands of electrodes

    Autoregressive models for text independent speaker identification in noisy environments

    Get PDF
    The closed-set speaker identification problem is defined as the search within a set of persons for the speaker of a certain utterance. It is reported that the Gaussian mixture model (GMM) classifier achieves very high classification accuracies (in the range 95% - 100%) when both the training and testing utterances are recorded in sound proof studio, i.e., there is neither additive noise nor spectral distortion to the speech signals. However, in real life applications, speech is usually corrupted by noise and band-limitation. Moreover, there is a mismatch between the recording conditions of the training and testing environments. As a result, the classification accuracy of GMM-based systems deteriorates significantly. In this thesis, we propose a two-step procedure for improving the speaker identification performance under noisy environment. In the first step, we introduce a new classifier: vector autoregressive Gaussian mixture (VARGM) model. Unlike the GMM, the new classifier models correlations between successive feature vectors. We also integrate the proposed method into the framework of the universal background model (UBM). In addition, we develop the learning procedure according to the maximum likelihood (ML) criterion. Based on a thorough experimental evaluation, the proposed method achieves an improvement of 3 to 5% in the identification accuracy. In the second step, we propose a new compensation technique based on the generalized maximum likelihood (GML) decision rule. In particular, we assume a general form for the distribution of the noise-corrupted utterances, which contains two types of parameters: clean speech-related parameters and noise-related parameters. While the clean speech related parameters are estimated during the training phase, the noise related parameters are estimated from the corrupted speech in the testing phase. We applied the proposed method to utterances of 50 speakers selected from the TIMIT database, artificially corrupted by convolutive and additive noise. The signal to noise ratio (SNR) varies from 0 to 20 dB. Simulation results reveal that the proposed method achieves good robustness against variation in the SNR. For utterances corrupted by covolutive noise, the improvement in the classification accuracy ranges from 70% for SNR = 0 dB to around 4% for SNR = 10dB, compared to the standard ML decision rule. For utterances corrupted by additive noise, the improvement in the classification accuracy ranges from 1% to 10% for SNRs ranging from 0 to 20 dB. The proposed VARGM classifier is also applied to the speech emotion classification problem. In particular, we use the Berlin emotional speech database to validate the classification performance of the proposed VARGM classifier. The proposed technique provides a classification accuracy of 76% versus 71% for the hidden Markov model, 67% for the k-nearest neighbors, 55% for feed-forward neural networks. The model gives also better discrimination between high-arousal emotions (joy, anger, fear), low arousal emotions (sadness, boredom), and neutral emotions than the HMM. Another interesting application of the VARGM model is the blind equalization of multi input multiple output (MIMO) communication channels. Based on VARGM modeling of MIMO channels, we propose a four-step equalization procedure. First, the received data vectors are fitted into a VARGM model using the expectation maximization (EM) algorithm. The constructed VARGM model is then used to filter the received data. A Baysian decision rule is then applied to identify the transmitted symbols up to a permutation and phase ambiguities, which are finally resolved using a small training sequence. Moreover, we propose a fast and easily implementable model order selection technique. The new equalization algorithm is compared to the whitening method and found to provide less symbol error probability. The proposed technique is also applied to frequency-flat slow fading channels and found to provide a more accurate estimate of the channel response than that provided by the blind de-convolution exploiting channel encoding (BDCC) method and at a higher information rate
    corecore