7,160 research outputs found

    Joint model-based recognition and localization of overlapped acoustic events using a set of distributed small microphone arrays

    Get PDF
    In the analysis of acoustic scenes, often the occurring sounds have to be detected in time, recognized, and localized in space. Usually, each of these tasks is done separately. In this paper, a model-based approach to jointly carry them out for the case of multiple simultaneous sources is presented and tested. The recognized event classes and their respective room positions are obtained with a single system that maximizes the combination of a large set of scores, each one resulting from a different acoustic event model and a different beamformer output signal, which comes from one of several arbitrarily-located small microphone arrays. By using a two-step method, the experimental work for a specific scenario consisting of meeting-room acoustic events, either isolated or overlapped with speech, is reported. Tests carried out with two datasets show the advantage of the proposed approach with respect to some usual techniques, and that the inclusion of estimated priors brings a further performance improvement.Comment: Computational acoustic scene analysis, microphone array signal processing, acoustic event detectio

    Structured Sparsity Models for Multiparty Speech Recovery from Reverberant Recordings

    Get PDF
    We tackle the multi-party speech recovery problem through modeling the acoustic of the reverberant chambers. Our approach exploits structured sparsity models to perform room modeling and speech recovery. We propose a scheme for characterizing the room acoustic from the unknown competing speech sources relying on localization of the early images of the speakers by sparse approximation of the spatial spectra of the virtual sources in a free-space model. The images are then clustered exploiting the low-rank structure of the spectro-temporal components belonging to each source. This enables us to identify the early support of the room impulse response function and its unique map to the room geometry. To further tackle the ambiguity of the reflection ratios, we propose a novel formulation of the reverberation model and estimate the absorption coefficients through a convex optimization exploiting joint sparsity model formulated upon spatio-spectral sparsity of concurrent speech representation. The acoustic parameters are then incorporated for separating individual speech signals through either structured sparse recovery or inverse filtering the acoustic channels. The experiments conducted on real data recordings demonstrate the effectiveness of the proposed approach for multi-party speech recovery and recognition.Comment: 31 page

    Towards End-to-End Acoustic Localization using Deep Learning: from Audio Signal to Source Position Coordinates

    Full text link
    This paper presents a novel approach for indoor acoustic source localization using microphone arrays and based on a Convolutional Neural Network (CNN). The proposed solution is, to the best of our knowledge, the first published work in which the CNN is designed to directly estimate the three dimensional position of an acoustic source, using the raw audio signal as the input information avoiding the use of hand crafted audio features. Given the limited amount of available localization data, we propose in this paper a training strategy based on two steps. We first train our network using semi-synthetic data, generated from close talk speech recordings, and where we simulate the time delays and distortion suffered in the signal that propagates from the source to the array of microphones. We then fine tune this network using a small amount of real data. Our experimental results show that this strategy is able to produce networks that significantly improve existing localization methods based on \textit{SRP-PHAT} strategies. In addition, our experiments show that our CNN method exhibits better resistance against varying gender of the speaker and different window sizes compared with the other methods.Comment: 18 pages, 3 figures, 8 table
    • …
    corecore