807 research outputs found

    Structured Sparsity Models for Multiparty Speech Recovery from Reverberant Recordings

    Get PDF
    We tackle the multi-party speech recovery problem through modeling the acoustic of the reverberant chambers. Our approach exploits structured sparsity models to perform room modeling and speech recovery. We propose a scheme for characterizing the room acoustic from the unknown competing speech sources relying on localization of the early images of the speakers by sparse approximation of the spatial spectra of the virtual sources in a free-space model. The images are then clustered exploiting the low-rank structure of the spectro-temporal components belonging to each source. This enables us to identify the early support of the room impulse response function and its unique map to the room geometry. To further tackle the ambiguity of the reflection ratios, we propose a novel formulation of the reverberation model and estimate the absorption coefficients through a convex optimization exploiting joint sparsity model formulated upon spatio-spectral sparsity of concurrent speech representation. The acoustic parameters are then incorporated for separating individual speech signals through either structured sparse recovery or inverse filtering the acoustic channels. The experiments conducted on real data recordings demonstrate the effectiveness of the proposed approach for multi-party speech recovery and recognition.Comment: 31 page

    Underdetermined Blind Separation of Nondisjoint Sources in the Time-Frequency Domain

    Get PDF
    International audienceThis paper considers the blind separation of non-stationary sources in the underdetermined case, when there are more sources than sensors. A general framework for this problem is to work on sources that are sparse in some signal representation domain. Recently, two methods have been proposed with respect to the time-frequency (TF) domain. The first uses quadratic time-frequency distributions (TFDs) and a clustering approach, and the second uses a linear TFD. Both of these methods assume that the sources are disjoint in the TF domain; i.e. there is at most one source present at a point in the TF domain. In this paper, we relax this assumption by allowing the sources to be TF-nondisjoint to a certain extent. In particular, the number of sources present at a point is strictly less than the number of sensors. The separation can still be achieved thanks to subspace projection that allows us to identify the sources present and to estimate their corresponding TFD values. In particular, we propose two subspace-based algorithms for TF-nondisjoint sources, one uses quadratic TFDs and the other a linear TFD. Another contribution of this paper is a new estimation procedure for the mixing matrix. Finally, then numerical performance of the proposed methods are provided highlighting their performance gain compared to existing ones

    The influence of random element displacement on DOA estimates obtained with (Khatri-Rao-)root-MUSIC

    Get PDF
    Although a wide range of direction of arrival (DOA) estimation algorithms has been described for a diverse range of array configurations, no specific stochastic analysis framework has been established to assess the probability density function of the error on DOA estimates due to random errors in the array geometry. Therefore, we propose a stochastic collocation method that relies on a generalized polynomial chaos expansion to connect the statistical distribution of random position errors to the resulting distribution of the DOA estimates. We apply this technique to the conventional root-MUSIC and the Khatri-Rao-root-MUSIC methods. According to Monte-Carlo simulations, this novel approach yields a speedup by a factor of more than 100 in terms of CPU-time for a one-dimensional case and by a factor of 56 for a two-dimensional case

    Source Separation for Hearing Aid Applications

    Get PDF

    Sparse and low-rank methods in structural system identification and monitoring

    Get PDF
    This paper presents sparse and low-rank methods for explicit modeling and harnessing the data structure to address the inverse problems in structural dynamics, identification, and data-driven health monitoring. In particular, it is shown that the structural dynamic features and damage information, intrinsic within the structural vibration response measurement data, possesses sparse and low-rank structure, which can be effectively modeled and processed by emerging mathematical tools such as sparse representation (SR), and low-rank matrix decomposition. It is also discussed that explicitly modeling and harnessing the sparse and low-rank data structure could benefit future work in developing data-driven approaches towards rapid, unsupervised, and effective system identification, damage detection, as well as massive SHM data sensing and management

    Sub-Nyquist Sampling: Bridging Theory and Practice

    Full text link
    Sampling theory encompasses all aspects related to the conversion of continuous-time signals to discrete streams of numbers. The famous Shannon-Nyquist theorem has become a landmark in the development of digital signal processing. In modern applications, an increasingly number of functions is being pushed forward to sophisticated software algorithms, leaving only those delicate finely-tuned tasks for the circuit level. In this paper, we review sampling strategies which target reduction of the ADC rate below Nyquist. Our survey covers classic works from the early 50's of the previous century through recent publications from the past several years. The prime focus is bridging theory and practice, that is to pinpoint the potential of sub-Nyquist strategies to emerge from the math to the hardware. In that spirit, we integrate contemporary theoretical viewpoints, which study signal modeling in a union of subspaces, together with a taste of practical aspects, namely how the avant-garde modalities boil down to concrete signal processing systems. Our hope is that this presentation style will attract the interest of both researchers and engineers in the hope of promoting the sub-Nyquist premise into practical applications, and encouraging further research into this exciting new frontier.Comment: 48 pages, 18 figures, to appear in IEEE Signal Processing Magazin

    Computational Methods for Underdetermined Convolutive Speech Localization and Separation via Model-based Sparse Component Analysis

    Get PDF
    In this paper, the problem of speech source localization and separation from recordings of convolutive underdetermined mixtures is studied. The problem is cast as recovering the spatio-spectral speech information embedded in a microphone array compressed measurements of the acoustic field. A model-based sparse component analysis framework is formulated for sparse reconstruction of the speech spectra in a reverberant acoustic resulting in joint localization and separation of the individual sources. We compare and contrast the computational approaches to model-based sparse recovery exploiting spatial sparsity as well as spectral structures underlying spectrographic representation of speech signals. In this context, we explore identification of the sparsity structures at the auditory and acoustic representation spaces. The auditory structures are formulated upon the principles of structural grouping based on proximity, autoregressive correlation and harmonicity of the spectral coefficients and they are incorporated for sparse reconstruction. The acoustic structures are formulated upon the image model of multipath propagation and they are exploited to characterize the compressive measurement matrix associated with microphone array recordings. Three approaches to sparse recovery relying on combinatorial optimization, convex relaxation and Bayesian methods are studied and evaluated based on thorough experiments. The sparse Bayesian learning method is shown to yield better perceptual quality while the interference suppression is also achieved using the combinatorial approach with the advantage of offering the most efficient computational cost. Furthermore, it is demonstrated that an average autoregressive model can be learned for speech localization and exploiting the proximity structure in the form of block sparse coefficients enables accurate localization. Throughout the extensive empirical evaluation, we confirm that a large and random placement of the microphones enables significant improvement in source localization and separation performance
    • 

    corecore