205 research outputs found

    Data-driven Threshold Selection for Direct Path Dominance Test

    Get PDF
    Direction-of-arrival estimation methods, when used with recordings made in enclosures are negatively affected by the reflections and reverberation in that enclosure. Direct path dominance (DPD) test was proposed as a pre-processing stage which can provide better DOA estimates by selecting only the time-frequency bins with a single dominant sound source component prior to DOA estimation, thereby reducing the total computational cost. DPD test involves selecting bins for which the ratio of the two largest singular values of the local spatial correlation matrix is above a threshold. The selection of this threshold is typically carried out in an ad hoc manner, which hinders the generalisation of this approach. This selection method also potentially increases the total computational cost or reduces the accuracy of DOA estimation. We propose a DPD test threshold selection method based on a data-driven statistical model. The model is based on the approximation of the singular value ratio distribution of the spatial correlation matrices as a generalised Pareto distribution and allows selecting time-frequency bins based on their probability of occurrence. We demonstrate the application of this threshold selection method via emulations using acoustic impulse responses measured in a highly reverberant room with a rigid spherical microphone array

    Reverberant Sound Localization with a Robot Head Based on Direct-Path Relative Transfer Function

    Get PDF
    International audienceThis paper addresses the problem of sound-source localization (SSL) with a robot head, which remains a challenge in real-world environments. In particular we are interested in locating speech sources, as they are of high interest for human-robot interaction. The microphone-pair response corresponding to the direct-path sound propagation is a function of the source direction. In practice, this response is contaminated by noise and reverberations. The direct-path relative transfer function (DP-RTF) is defined as the ratio between the direct-path acoustic transfer function (ATF) of the two microphones, and it is an important feature for SSL. We propose a method to estimate the DP-RTF from noisy and reverberant signals in the short-time Fourier transform (STFT) domain. First, the convolutive transfer function (CTF) approximation is adopted to accurately represent the impulse response of the microphone array, and the first coefficient of the CTF is mainly composed of the direct-path ATF. At each frequency, the frame-wise speech auto-and cross-power spectral density (PSD) are obtained by spectral subtraction. Then a set of linear equations is constructed by the speech auto-and cross-PSD of multiple frames, in which the DP-RTF is an unknown variable, and is estimated by solving the equations. Finally, the estimated DP-RTFs are concatenated across frequencies and used as a feature vector for SSL. Experiments with a robot, placed in various reverberant environments, show that the proposed method outperforms two state-of-the-art methods

    Estimating uncertainty models for speech source localization in real-world environments

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 131-140).This thesis develops improved solutions to the problems of audio source localization and speech source separation in real reverberant environments. For source localization, it develops a new time- and frequency-dependent weighting function for the generalized cross-correlation framework for time delay estimation. This weighting function is derived from the speech spectrogram as the result of a transformation designed to optimally predict localization cue accuracy. By structuring the problem in this way, we take advantage of the nonstationarity of speech in a way that is similar to the psychoacoustics of the precedence effect. For source separation, we use the same weighting function as part of a simple probabilistic generative model of localization cues. We combine this localization cue model with a mixture model of speech log-spectra and use this combined model to do speech source separation. For both source localization and source separation, we show significantly performance improvements over existing techniques on both real and simulated data in a range of acoustic environments.by Kevin William Wilson.Ph.D

    From sensory perception to spatial cognition

    Get PDF
    To interact with the environmet, it is crucial to have a clear space representation. Several findings have shown that the space around our body is split in several portions, which are differentially coded by the brain. Evidences of such subdivision have been reported by studies on people affected by neglect, on space near (peripersonal) and far (extrapersonal) to the body position and considering space around specific different portion of the body. Moreover, recent studies showed that sensory modalities are at the base of important cognitive skills. However, it is still unclear if each sensory modality has a different role in the development of cognitive skills in the several portions of space around the body. Recent works showed that the visual modality is crucial for the development of spatial representation. This idea is supported by studies on blind individuals showing that visual information is fundamental for the development of auditory spatial representation. For example, blind individuals are not able to perform the spatial bisection task, a task that requires to build an auditory spatial metric, a skill that sighted children acquire around 6 years of age. Based these prior researches, we hypothesize that if different sensory modalities have a role on the devlopment of different cognitive skills, then we should be able to find a clear correlation between availability of the sensory modality and the cognitive skill associated. In particular we hypothesize that the visual information is crucial for the development of auditory space represnetation; if this is true, we should find different spatial skill between front and back spaces. In this thesis, I provide evidences that spaces around our body are differently influenced by sensory modalities. Our results suggest that visual input have a pivotal role in the development of auditory spatial representation and that this applies only to the frontal space. Indeed sighted people are less accurated in spatial task only in space where vision is not present (i.e. the back), while blind people show no differences between front and back spaces. On the other hand, people tend to report sounds in the back space, suggesting that the role of hearing in allertness could be more important in the back than frontal spaces. Finally, we show that natural training, stressing the integration of audio motor stimuli, can restore spatial cognition, opening new possibility for rehabilitation programs. Spatial cognition is a well studied topic. However, we think our findings fill the gap regarding how the different availibility of sensory information, across spaces, causes the development of different cognitive skills in these spaces. This work is the starting point to understand the strategies that the brain adopts to maximize its resources by processing, in the more efficient way, as much information as possible
    corecore