    Robust Distributed Multi-Source Detection and Labeling in Wireless Acoustic Sensor Networks

    The growing demand in complex signal processing methods associated with low-energy large scale wireless acoustic sensor networks (WASNs) urges the shift to a new information and communication technologies (ICT) paradigm. The emerging research perception aspires for an appealing wireless network communication where multiple heterogeneous devices with different interests can cooperate in various signal processing tasks (MDMT). Contributions in this doctoral thesis focus on distributed multi-source detection and labeling applied to audio enhancement scenarios pursuing an MDMT fashioned node-specific source-of-interest signal enhancement in WASNs. In fact, an accurate detection and labeling is a pre-requisite to pursue the MDMT paradigm where nodes in the WASN communicate effectively their sources-of-interest and, therefore, multiple signal processing tasks can be enhanced via cooperation. First, a novel framework based on a dominant source model in distributed WASNs for resolving the activity detection of multiple speech sources in a reverberant and noisy environment is introduced. A preliminary rank-one multiplicative non-negative independent component analysis (M-NICA) for unique dominant energy source extraction given associated node clusters is presented. Partitional algorithms that minimize the within-cluster mean absolute deviation (MAD) and weighted MAD objectives are proposed to determine the cluster membership of the unmixed energies, and thus establish a source specific voice activity recognition. In a second study, improving the energy signal separation to alleviate the multiple source activity discrimination task is targeted. Sparsity inducing penalties are enforced on iterative rank-one singular value decomposition layers to extract sparse right rotations. Then, sparse non-negative blind energy separation is realized using multiplicative updates. Hence, the multiple source detection problem is converted into a sparse non-negative source energy decorrelation. Sparsity tunes the supposedly non-active energy signatures to exactly zero-valued energies so that it is easier to identify active energies and an activity detector can be constructed in a straightforward manner. In a centralized scenario, the activity decision is controlled by a fusion center that delivers the binary source activity detection for every participating energy source. This strategy gives precise detection results for small source numbers. With a growing number of interfering sources, the distributed detection approach is more promising. Conjointly, a robust distributed energy separation algorithm for multiple competing sources is proposed. A robust and regularized tνMt_{\nu}M-estimation of the covariance matrix of the mixed energies is employed. This approach yields a simple activity decision using only the robustly unmixed energy signatures of the sources in the WASN. The performance of the robust activity detector is validated with a distributed adaptive node-specific signal estimation method for speech enhancement. The latter enhances the quality and intelligibility of the signal while exploiting the accurately estimated multi-source voice decision patterns. In contrast to the original M-NICA for source separation, the extracted binary activity patterns with the robust energy separation significantly improve the node-specific signal estimation. Due to the increased computational complexity caused by the additional step of energy signal separation, a new approach to solving the detection question of multi-device multi-source networks is presented. Stability selection for iterative extraction of robust right singular vectors is considered. The sub-sampling selection technique provides transparency in properly choosing the regularization variable in the Lasso optimization problem. In this way, the strongest sparse right singular vectors using a robust 1\ell_1-norm and stability selection are the set of basis vectors that describe the input data efficiently. Active/non-active source classification is achieved based on a robust Mahalanobis classifier. For this, a robust MM-estimator of the covariance matrix in the Mahalanobis distance is utilized. Extensive evaluation in centralized and distributed settings is performed to assess the effectiveness of the proposed approach. Thus, overcoming the computationally demanding source separation scheme is possible via exploiting robust stability selection for sparse multi-energy feature extraction. With respect to the labeling problem of various sources in a WASN, a robust approach is introduced that exploits the direction-of-arrival of the impinging source signals. A short-time Fourier transform-based subspace method estimates the angles of locally stationary wide band signals using a uniform linear array. The median of angles estimated at every frequency bin is utilized to obtain the overall angle for each participating source. The features, in this case, exploit the similarity across devices in the particular frequency bins that produce reliable direction-of-arrival estimates for each source. Reliability is defined with respect to the median across frequencies. All source-specific frequency bands that contribute to correct estimated angles are selected. A feature vector is formed for every source at each device by storing the frequency bin indices that lie within the upper and lower interval of the median absolute deviation scale of the estimated angle. Labeling is accomplished by a distributed clustering of the extracted angle-based feature vectors using consensus averaging

    Robust estimation of directions-of-arrival in diffuse noise based on matrix-space sparsity

    We consider the estimation of the Directions-Of-Arrival (DOA) of target signals in diffuse noise. The state-of-the-art MUltiple SIgnal Classification (MUSIC) algorithm necessitates accurate identification of the signal subspace. In diffuse noise, however, it is difficult to identify it directly from the observed spatial covariance matrix. In our approach, we estimate the target spatial covariance matrix, so that we can identify the orthogonal complement of the signal subspace as its null space. We present a unified framework for modeling noise covariance in a matrix space, which generalizes four state-of-the-art diffuse noise models. We propose two alternative algorithms for estimating the target spatial covariance matrix, namely Low-rank Matrix Completion (LMC) and Trace Norm Minimization (TNM). These rely on denoising of the observed spatial covariance matrix via orthogonal projection onto the orthogonal complement of the noise matrix subspace. The missing component lying in the noise matrix subspace is then completed by exploiting the low-rankness of the target spatial covariance matrix. Large-scale experiments with real-world noise show that TNM with a certain noise model outperforms conventional MUSIC based on Generalized EigenValue Decomposition (GEVD) by 5% in terms of the precision averaged over the dataset

    Semi-blind Bayesian inference of CMB map and power spectrum

    We present a new blind formulation of the Cosmic Microwave Background (CMB) inference problem. The approach relies on a phenomenological model of the multi-frequency microwave sky without the need for physical models of the individual components. For all-sky and high resolution data, it unifies parts of the analysis that have previously been treated separately, such as component separation and power spectrum inference. We describe an efficient sampling scheme that fully explores the component separation uncertainties on the inferred CMB products such as maps and/or power spectra. External information about individual components can be incorporated as a prior giving a flexible way to progressively and continuously introduce physical component separation from a maximally blind approach. We connect our Bayesian formalism to existing approaches such as Commander, SMICA and ILC, and discuss possible future extensions.Comment: 11 pages, 9 figure

    SAR Tomography via Nonlinear Blind Scatterer Separation

    Layover separation has been fundamental to many synthetic aperture radar applications, such as building reconstruction and biomass estimation. Retrieving the scattering profile along the mixed dimension (elevation) is typically solved by inversion of the SAR imaging model, a process known as SAR tomography. This paper proposes a nonlinear blind scatterer separation method to retrieve the phase centers of the layovered scatterers, avoiding the computationally expensive tomographic inversion. We demonstrate that conventional linear separation methods, e.g., principle component analysis (PCA), can only partially separate the scatterers under good conditions. These methods produce systematic phase bias in the retrieved scatterers due to the nonorthogonality of the scatterers' steering vectors, especially when the intensities of the sources are similar or the number of images is low. The proposed method artificially increases the dimensionality of the data using kernel PCA, hence mitigating the aforementioned limitations. In the processing, the proposed method sequentially deflates the covariance matrix using the estimate of the brightest scatterer from kernel PCA. Simulations demonstrate the superior performance of the proposed method over conventional PCA-based methods in various respects. Experiments using TerraSAR-X data show an improvement in height reconstruction accuracy by a factor of one to three, depending on the used number of looks.Comment: This work has been accepted by IEEE TGRS for publicatio

    Dirichlet latent variable model : a dynamic model based on Dirichlet prior for audio processing

    We propose a dynamic latent variable model for learning latent bases from time varying, non-negative data. We take a probabilistic approach to modeling the temporal dependence in data by introducing a dynamic Dirichlet prior – a Dirichlet distribution with dynamic parameters. This new distribution allows us to assure non-negativity and avoid intractability when sequential updates are performed (otherwise encountered in using Dirichlet prior). We refer to the proposed model as the Dirichlet latent variable model (DLVM). We develop an expectation maximization algorithm for the proposed model, and also derive a maximum a posteriori estimate of the parameters. Furthermore, we connect the proposed DLVM to two popular latent basis learning methods - probabilistic latent component analysis (PLCA) and non-negative matrix factorization (NMF).We show that (i) PLCA is a special case of our DLVM, and (ii) DLVM can be interpreted as a dynamic version of NMF. The usefulness of DLVM is demonstrated for three audio processing applications - speaker source separation, denoising, and bandwidth expansion. To this end, a new algorithm for source separation is also proposed. Through extensive experiments on benchmark databases, we show that the proposed model out performs several relevant existing methods in all three applications

    Audio source separation for music in low-latency and high-latency scenarios

    Aquesta tesi proposa mètodes per tractar les limitacions de les tècniques existents de separació de fonts musicals en condicions de baixa i alta latència. En primer lloc, ens centrem en els mètodes amb un baix cost computacional i baixa latència. Proposem l'ús de la regularització de Tikhonov com a mètode de descomposició de l'espectre en el context de baixa latència. El comparem amb les tècniques existents en tasques d'estimació i seguiment dels tons, que són passos crucials en molts mètodes de separació. A continuació utilitzem i avaluem el mètode de descomposició de l'espectre en tasques de separació de veu cantada, baix i percussió. En segon lloc, proposem diversos mètodes d'alta latència que milloren la separació de la veu cantada, gràcies al modelatge de components específics, com la respiració i les consonants. Finalment, explorem l'ús de correlacions temporals i anotacions manuals per millorar la separació dels instruments de percussió i dels senyals musicals polifònics complexes.Esta tesis propone métodos para tratar las limitaciones de las técnicas existentes de separación de fuentes musicales en condiciones de baja y alta latencia. En primer lugar, nos centramos en los métodos con un bajo coste computacional y baja latencia. Proponemos el uso de la regularización de Tikhonov como método de descomposición del espectro en el contexto de baja latencia. Lo comparamos con las técnicas existentes en tareas de estimación y seguimiento de los tonos, que son pasos cruciales en muchos métodos de separación. A continuación utilizamos y evaluamos el método de descomposición del espectro en tareas de separación de voz cantada, bajo y percusión. En segundo lugar, proponemos varios métodos de alta latencia que mejoran la separación de la voz cantada, gracias al modelado de componentes que a menudo no se toman en cuenta, como la respiración y las consonantes. Finalmente, exploramos el uso de correlaciones temporales y anotaciones manuales para mejorar la separación de los instrumentos de percusión y señales musicales polifónicas complejas.This thesis proposes specific methods to address the limitations of current music source separation methods in low-latency and high-latency scenarios. First, we focus on methods with low computational cost and low latency. We propose the use of Tikhonov regularization as a method for spectrum decomposition in the low-latency context. We compare it to existing techniques in pitch estimation and tracking tasks, crucial steps in many separation methods. We then use the proposed spectrum decomposition method in low-latency separation tasks targeting singing voice, bass and drums. Second, we propose several high-latency methods that improve the separation of singing voice by modeling components that are often not accounted for, such as breathiness and consonants. Finally, we explore using temporal correlations and human annotations to enhance the separation of drums and complex polyphonic music signals

    Group Sparse Recovery via the ℓ0(ℓ2) Penalty: Theory and Algorithm

    In this work we propose and analyze a novel approach for recovering group sparse signals, which arise naturally in a number of practical applications. It is based on regularized least squares with an ℓ0(ℓ2) penalty. One distinct feature of the new approach is that it has the built-in decorrelation mechanism within each group, and thus can handle the challenging strong inner-group correlation. We provide a complete analysis of the regularized model, e.g., the existence of global minimizers, invariance property, support recovery, and characterization and properties of block coordinatewise minimizers. Further, the regularized functional can be minimized efficiently and accurately by a primal dual active set algorithm with provable global convergence. In particular, at each iteration, it involves solving least squares problems on the active set only, and merits fast local convergence, which makes the method extremely efficient for recovering group sparse signals. Extensive numerical experiments are presented to illustrate salient features of the model and the efficiency and accuracy of the algorithm. A comparative experimental study indicates that it is competitive with existing approaches