143 research outputs found

    Structured Sparsity Models for Multiparty Speech Recovery from Reverberant Recordings

    Get PDF
    We tackle the multi-party speech recovery problem through modeling the acoustic of the reverberant chambers. Our approach exploits structured sparsity models to perform room modeling and speech recovery. We propose a scheme for characterizing the room acoustic from the unknown competing speech sources relying on localization of the early images of the speakers by sparse approximation of the spatial spectra of the virtual sources in a free-space model. The images are then clustered exploiting the low-rank structure of the spectro-temporal components belonging to each source. This enables us to identify the early support of the room impulse response function and its unique map to the room geometry. To further tackle the ambiguity of the reflection ratios, we propose a novel formulation of the reverberation model and estimate the absorption coefficients through a convex optimization exploiting joint sparsity model formulated upon spatio-spectral sparsity of concurrent speech representation. The acoustic parameters are then incorporated for separating individual speech signals through either structured sparse recovery or inverse filtering the acoustic channels. The experiments conducted on real data recordings demonstrate the effectiveness of the proposed approach for multi-party speech recovery and recognition.Comment: 31 page

    Sparse parametric modeling of the early part of acoustic impulse responses

    Get PDF
    Acoustic channels are typically described by their Acoustic Impulse Response (AIR) as a Moving Average (MA) process. Such AIRs are often considered in terms of their early and late parts, describing discrete reflections and the diffuse reverberation tail respectively. We propose an approach for constructing a sparse parametric model for the early part. The model aims at reducing the number of parameters needed to represent it and subsequently reconstruct from the representation the MA coefficients that describe it. It consists of a representation of the reflections arriving at the receiver as delayed copies of an excitation signal. The Time-Of-Arrivals of reflections are not restricted to integer sample instances and a dynamically estimated model for the excitation sound is used. We also present a corresponding parameter estimation method, which is based on regularized-regression and nonlinear optimization. The proposed method also serves as an analysis tool, since estimated parameters can be used for the estimation of room geometry, the mixing time and other channel properties. Experiments involving simulated and measured AIRs are presented, in which the AIR coefficient reconstruction-error energy does not exceed 11.4% of the energy of the original AIR coefficients. The results also indicate dimensionality reduction figures exceeding 90% when compared to a MA process representation

    Mathematical modelling ano optimization strategies for acoustic source localization in reverberant environments

    Get PDF
    La presente Tesis se centra en el uso de técnicas modernas de optimización y de procesamiento de audio para la localización precisa y robusta de personas dentro de un entorno reverberante dotado con agrupaciones (arrays) de micrófonos. En esta tesis se han estudiado diversos aspectos de la localización sonora, incluyendo el modelado, la algoritmia, así como el calibrado previo que permite usar los algoritmos de localización incluso cuando la geometría de los sensores (micrófonos) es desconocida a priori. Las técnicas existentes hasta ahora requerían de un número elevado de micrófonos para obtener una alta precisión en la localización. Sin embargo, durante esta tesis se ha desarrollado un nuevo método que permite una mejora de más del 30\% en la precisión de la localización con un número reducido de micrófonos. La reducción en el número de micrófonos es importante ya que se traduce directamente en una disminución drástica del coste y en un aumento de la versatilidad del sistema final. Adicionalmente, se ha realizado un estudio exhaustivo de los fenómenos que afectan al sistema de adquisición y procesado de la señal, con el objetivo de mejorar el modelo propuesto anteriormente. Dicho estudio profundiza en el conocimiento y modelado del filtrado PHAT (ampliamente utilizado en localización acústica) y de los aspectos que lo hacen especialmente adecuado para localización. Fruto del anterior estudio, y en colaboración con investigadores del instituto IDIAP (Suiza), se ha desarrollado un sistema de auto-calibración de las posiciones de los micrófonos a partir del ruido difuso presente en una sala en silencio. Esta aportación relacionada con los métodos previos basados en la coherencia. Sin embargo es capaz de reducir el ruido atendiendo a parámetros físicos previamente conocidos (distancia máxima entre los micrófonos). Gracias a ello se consigue una mejor precisión utilizando un menor tiempo de cómputo. El conocimiento de los efectos del filtro PHAT ha permitido crear un nuevo modelo que permite la representación 'sparse' del típico escenario de localización. Este tipo de representación se ha demostrado ser muy conveniente para localización, permitiendo un enfoque sencillo del caso en el que existen múltiples fuentes simultáneas. La última aportación de esta tesis, es el de la caracterización de las Matrices TDOA (Time difference of arrival -Diferencia de tiempos de llegada, en castellano-). Este tipo de matrices son especialmente útiles en audio pero no están limitadas a él. Además, este estudio transciende a la localización con sonido ya que propone métodos de reducción de ruido de las medias TDOA basados en una representación matricial 'low-rank', siendo útil, además de en localización, en técnicas tales como el beamforming o el autocalibrado

    Direction of Arrival with One Microphone, a few LEGOs, and Non-Negative Matrix Factorization

    Get PDF
    Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also localize sounds. Monaural localization is possible thanks to the scattering by the head, though it hinges on learning the spectra of the various sources. We take inspiration from this human ability to propose algorithms for accurate sound source localization using a single microphone embedded in an arbitrary scattering structure. The structure modifies the frequency response of the microphone in a direction-dependent way giving each direction a signature. While knowing those signatures is sufficient to localize sources of white noise, localizing speech is much more challenging: it is an ill-posed inverse problem which we regularize by prior knowledge in the form of learned non-negative dictionaries. We demonstrate a monaural speech localization algorithm based on non-negative matrix factorization that does not depend on sophisticated, designed scatterers. In fact, we show experimental results with ad hoc scatterers made of LEGO bricks. Even with these rudimentary structures we can accurately localize arbitrary speakers; that is, we do not need to learn the dictionary for the particular speaker to be localized. Finally, we discuss multi-source localization and the related limitations of our approach.Comment: This article has been accepted for publication in IEEE/ACM Transactions on Audio, Speech, and Language processing (TASLP

    Computation of spherical harmonic representations of source directivity based on the finite-distance signature

    Get PDF
    The measurement of directivity for sound sources that are not electroacoustic transducers is fundamentally limited because the source cannot be driven with arbitrary signals. A consequence is that directivity can only be measured at a sparse set of frequencies—for example, at the stable partial oscillations of a steady tone played by a musical instrument or from the human voice. This limitation prevents the data from being used in certain applications such as time-domain room acoustic simulations where the directivity needs to be available at all frequencies in the frequency range of interest. We demonstrate in this article that imposing the signature of the directivity that is obtained at a given distance on a spherical wave allows for all interpolation that is required for obtaining a complete spherical harmonic representation of the source’s directivity, i.e., a representation that is viable at any frequency, in any direction, and at any distance. Our approach is inspired by the far-field signature of exterior sound fields. It is not capable of incorporating the phase of the directivity directly. We argue based on directivity measurement data of musical instruments that the phase of such measurement data is too unreliable or too ambiguous to be useful. We incorporate numerically-derived directivity into the example application of finite difference time domain simulation of the acoustic field, which has not been possible previously

    Identification du bruit d'entrée et de sortie sur des moteurs d'avion par antennes microphoniques

    Get PDF
    Abstract : This thesis considers the discrimination of inlet / exhaust noise of aero-engines in free-field static tests using far-field microphone arrays. Various techniques are compared for this problem, including classical beamforming (CB), regularized inverse method (Tikhonov regularization), LI - generalized inverse beamforming (LI-GIB), clean-PSF, clean-SC and two novel methods which are called hybrid method and clean-hybrid. The classical beamforming method is disadvantaged due to its need for a high number of measurement microphones in accordance with the requirements. Similarly, the inverse method is disadvantaged due to their need of having a priori source information. The classical Tikhonov regularization provides improvements in solution stability, however continues to be disadvantaged due to its requirement of imposing a stronger penalty for undetected source positions. Coherent and incoherent sources are resolved by LI-generalized inverse beamforming (L1-GIB). This algorithm can distinguish the multipole sources as well as the monopoles sources. However, source identification by LI-generalized inverse beamforming takes much time and requires a PC with high memory. The hybrid method is a new regularization method which involves the use of an a priori beamforming measurement to define a data-dependent discrete smoothing norm for the regularization of the inverse problem. Compared to the classical beamforming and the inverse modeling, the hybrid (beamforming regularization) approach provides improved source strength maps without substantial added complexity. Although the hybrid method rather solves the disadvantage of the former methods, the application of this method for identification of weaker sources in the presence of the strong sources isn't satisfactory. This can be explained by the large penalization being applied to the weaker source in the hybrid method, which results in underestimation of source strength for this source. To overcome this defect, the clean-SC method and the proposed clean-hybrid method, which is a combination of the hybrid method and the clean-SC, are applied. These methods remove the effect of the strong sources in source power maps to identify the weaker sources. The proposed methods which represent the main contribution of this thesis show promising results and opens new research avenues. Theoretical study of all approaches is performed for various sources and configurations of array. In order to validate the theoretical study, several laboratory experiments are conducted at Universito de Sherbrooke. The proposed methods have further been applied to the measured noise data from a Pratt & Whitney Canada turbo-fan engine and have been observed to provide better spatial resolution and solution robustness with a limited number of measurement microphones compared to the existing methods.Résumé : La présente thèse étudie la discrimination du bruit d'entrée / de sortie des moteurs d'avion dans des tests statiques en champ libre en utilisant des antennes de microphones en champ lointain. Diverses techniques sont comparées pour ce problème, dont la formation de voie classique (CB), la méthode inverse régularisée (régularisation de Tikhonov), la formation de voies généralisée inverse (L1-GIB), Clean-PSF, Clean-SC et deux méthodes proposées qui s'appellent la méthode hybride et la méthode Clean-hybride. La méthode la formation de voie classique est désavantagée en raison de son besoin de nombreux microphones de mesure. De même, la méthode inverse est désavantagée en raison du besoin d'information a priori sur les sources. La régularisation Tikhonov classique fournit des améliorations dans. la stabilité de la solution; cependant elle reste désavantageuse en raison de son exigence d'imposer une pénalité plus forte pour des positions de source non détectées. Des sources cohérentes et incohérentes peuvent être résolues par la formation de voies généralisée inverse (L1-GIB). Cet algorithme peut identifier les sources multi- polaires aussi bien que les sources monopolaires. Cependant, l'identification de source par la formation de voies généralisée inverse prend beaucoup de temps et exige un ordinateur avec une capacité de mémoire élevée. La méthode hybride est une nouvelle méthode de régularisation qui implique l'utilisation d'un traitement par formation de voie a priori pour définir une norme discrète et dépendante des données pour la régularisation du problème inverse. En comparaison avec la formation de voie classique et la méthode inverse, l'approche hybride (régularisation par formation de voie) fournit des cartographies améliorées d'amplitudes de sources sans aucune complexité supplémentaire substantielle. Bien que la méthode hybride lève les limitations des méthodes classiques, l'application de cette méthode pour l'identification de sources de faible puissance en présence de sources de forte puissance n'est pas satisfaisante. On peut expliquer ceci par la plus grande pénalisation appliquée à la source plus faible dans la méthode hybride, qui aboutit à la sous-estimation de l'amplitude de cette source. Pour surmonter ce défaut, la méthode Clean-SC et la méthode Clean-hybrides proposée qui est une combinaison de la méthode hybride et de Clean-SC sont appliquées. Ces méthodes éliminent l'effet des sources fortes dans les cartographies de puissance de sources pour identifier les sources plus faibles. Les méthodes proposées qui représentent la contribution principale de cette thèse conduisent à des résultats fiables et ouvrent des nouvelles voies de recherche. L'étude théorique de toutes les approches est menée pour divers types de sources et de configurations microphoniques. Pour valider l'étude théorique, plusieurs expériences en laboratoire sont réalisées à Université de Sherbrooke. Les méthodes proposées ont été appliquées aux données de bruit mesurées d'une turbo-soufflante Pratt & Whitney Canada pour fournir une meilleure résolution spatiale des sources acoustique et une solution robuste avec un nombre limité des microphones de mesure comparé aux méthodes existantes

    Structured Sparsity Models for Reverberant Speech Separation

    Get PDF
    We tackle the multi-party speech recovery problem through modeling the acoustic of the reverberant chambers. Our approach exploits structured sparsity models to perform room modeling and speech recovery. We propose a scheme for characterizing the room acoustic from the unknown competing speech sources relying on localization of the early images of the speakers by sparse approximation of the spatial spectra of the virtual sources in a free-space model. The images are then clustered exploiting the low-rank structure of the spectro-temporal components belonging to each source. This enables us to identify the early support of the room impulse response function and its unique map to the room geometry. To further tackle the ambiguity of the reflection ratios, we propose a novel formulation of the reverberation model and estimate the absorption coefficients through a convex optimization exploiting joint sparsity model formulated upon spatio-spectral sparsity of concurrent speech representation. The acoustic parameters are then incorporated for separating individual speech signals through either structured sparse recovery or inverse filtering the acoustic channels. The experiments conducted on real data recordings demonstrate the effectiveness of the proposed approach for multi-party speech recovery and recognition
    • …
    corecore