3,136 research outputs found

    Efficient coding of spectrotemporal binaural sounds leads to emergence of the auditory space representation

    Full text link
    To date a number of studies have shown that receptive field shapes of early sensory neurons can be reproduced by optimizing coding efficiency of natural stimulus ensembles. A still unresolved question is whether the efficient coding hypothesis explains formation of neurons which explicitly represent environmental features of different functional importance. This paper proposes that the spatial selectivity of higher auditory neurons emerges as a direct consequence of learning efficient codes for natural binaural sounds. Firstly, it is demonstrated that a linear efficient coding transform - Independent Component Analysis (ICA) trained on spectrograms of naturalistic simulated binaural sounds extracts spatial information present in the signal. A simple hierarchical ICA extension allowing for decoding of sound position is proposed. Furthermore, it is shown that units revealing spatial selectivity can be learned from a binaural recording of a natural auditory scene. In both cases a relatively small subpopulation of learned spectrogram features suffices to perform accurate sound localization. Representation of the auditory space is therefore learned in a purely unsupervised way by maximizing the coding efficiency and without any task-specific constraints. This results imply that efficient coding is a useful strategy for learning structures which allow for making behaviorally vital inferences about the environment.Comment: 22 pages, 9 figure

    Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression

    Get PDF
    This paper addresses the problem of localizing audio sources using binaural measurements. We propose a supervised formulation that simultaneously localizes multiple sources at different locations. The approach is intrinsically efficient because, contrary to prior work, it relies neither on source separation, nor on monaural segregation. The method starts with a training stage that establishes a locally-linear Gaussian regression model between the directional coordinates of all the sources and the auditory features extracted from binaural measurements. While fixed-length wide-spectrum sounds (white noise) are used for training to reliably estimate the model parameters, we show that the testing (localization) can be extended to variable-length sparse-spectrum sounds (such as speech), thus enabling a wide range of realistic applications. Indeed, we demonstrate that the method can be used for audio-visual fusion, namely to map speech signals onto images and hence to spatially align the audio and visual modalities, thus enabling to discriminate between speaking and non-speaking faces. We release a novel corpus of real-room recordings that allow quantitative evaluation of the co-localization method in the presence of one or two sound sources. Experiments demonstrate increased accuracy and speed relative to several state-of-the-art methods.Comment: 15 pages, 8 figure

    Phase locking below rate threshold in noisy model neurons

    Get PDF
    The property of a neuron to phase-lock to an oscillatory stimulus before adapting its spike rate to the stimulus frequency plays an important role for the auditory system. We investigate under which conditions neurons exhibit this phase locking below rate threshold. To this end, we simulate neurons employing the widely used leaky integrate-and-fire (LIF) model. Tuning parameters, we can arrange either an irregular spontaneous or a tonic spiking mode. When the neuron is stimulated in both modes, a significant rise of vector strength prior to a noticeable change of the spike rate can be observed. Combining analytic reasoning with numerical simulations, we trace this observation back to a modulation of interspike intervals, which itself requires spikes to be only loosely coupled. We test the limits of this conception by simulating an LIF model with threshold fatigue, which generates pronounced anticorrelations between subsequent interspike intervals. In addition we evaluate the LIF response for harmonic stimuli of various frequencies and discuss the extension to more complex stimuli. It seems that phase locking below rate threshold occurs generically for all zero mean stimuli. Finally, we discuss our findings in the context of stimulus detection

    Functional Sensory Representations of Natural Stimuli: the Case of Spatial Hearing

    Get PDF
    In this thesis I attempt to explain mechanisms of neuronal coding in the auditory system as a form of adaptation to statistics of natural stereo sounds. To this end I analyse recordings of real-world auditory environments and construct novel statistical models of these data. I further compare regularities present in natural stimuli with known, experimentally observed neuronal mechanisms of spatial hearing. In a more general perspective, I use binaural auditory system as a starting point to consider the notion of function implemented by sensory neurons. In particular I argue for two, closely-related tenets: 1. The function of sensory neurons can not be fully elucidated without understanding statistics of natural stimuli they process. 2. Function of sensory representations is determined by redundancies present in the natural sensory environment. I present the evidence in support of the first tenet by describing and analysing marginal statistics of natural binaural sound. I compare observed, empirical distributions with knowledge from reductionist experiments. Such comparison allows to argue that the complexity of the spatial hearing task in the natural environment is much higher than analytic, physics-based predictions. I discuss the possibility that early brain stem circuits such as LSO and MSO do not \"compute sound localization\" as is often being claimed in the experimental literature. I propose that instead they perform a signal transformation, which constitutes the first step of a complex inference process. To support the second tenet I develop a hierarchical statistical model, which learns a joint sparse representation of amplitude and phase information from natural stereo sounds. I demonstrate that learned higher order features reproduce properties of auditory cortical neurons, when probed with spatial sounds. Reproduced aspects were hypothesized to be a manifestation of a fine-tuned computation specific to the sound-localization task. Here it is demonstrated that they rather reflect redundancies present in the natural stimulus. Taken together, results presented in this thesis suggest that efficient coding is a strategy useful for discovering structures (redundancies) in the input data. Their meaning has to be determined by the organism via environmental feedback

    Application of Machine Learning for the Spatial Analysis of Binaural Room Impulse Responses

    Get PDF
    Spatial impulse response analysis techniques are commonly used in the field of acoustics, as they help to characterise the interaction of sound with an enclosed environment. This paper presents a novel approach for spatial analyses of binaural impulse responses, using a binaural model fronted neural network. The proposed method uses binaural cues utilised by the human auditory system, which are mapped by the neural network to the azimuth direction of arrival classes. A cascade-correlation neural network was trained using a multi-conditional training dataset of head-related impulse responses with added noise. The neural network is tested using a set of binaural impulse responses captured using two dummy head microphones in an anechoic chamber, with a reflective boundary positioned to produce a reflection with a known direction of arrival. Results showed that the neural network was generalisable for the direct sound of the binaural room impulse responses for both dummy head microphones. However, it was found to be less accurate at predicting the direction of arrival of the reflections. The work indicates the potential of using such an algorithm for the spatial analysis of binaural impulse responses, while indicating where the method applied needs to be made more robust for more general application

    Video-aided model-based source separation in real reverberant rooms

    Get PDF
    Source separation algorithms that utilize only audio data can perform poorly if multiple sources or reverberation are present. In this paper we therefore propose a video-aided model-based source separation algorithm for a two-channel reverberant recording in which the sources are assumed static. By exploiting cues from video, we first localize individual speech sources in the enclosure and then estimate their directions. The interaural spatial cues, the interaural phase difference and the interaural level difference, as well as the mixing vectors are probabilistically modeled. The models make use of the source direction information and are evaluated at discrete timefrequency points. The model parameters are refined with the wellknown expectation-maximization (EM) algorithm. The algorithm outputs time-frequency masks that are used to reconstruct the individual sources. Simulation results show that by utilizing the visual modality the proposed algorithm can produce better timefrequency masks thereby giving improved source estimates. We provide experimental results to test the proposed algorithm in different scenarios and provide comparisons with both other audio-only and audio-visual algorithms and achieve improved performance both on synthetic and real data. We also include dereverberation based pre-processing in our algorithm in order to suppress the late reverberant components from the observed stereo mixture and further enhance the overall output of the algorithm. This advantage makes our algorithm a suitable candidate for use in under-determined highly reverberant settings where the performance of other audio-only and audio-visual methods is limited

    Target-depth estimation in active sonar

    No full text
    International audienceIn active sonar, the objectives are to detect, localize and classify an underwater target. Azimuth and range are often used in anti-submarine warfare to localize targets. The depth may also be used as the key tactical information for strategy purposes or as a good feature for target classification or discrimination. Two dimensional arrays as ank arrays, cylindrical arrays, and hullmounted arrays have access to elevation angles. Even linear towed arrays can give some information about the elevation using the different conical bearings measured when multipath propagation arises. In the context of long ranges and summer Mediterranean sound-speed profile (SSP), this paper presents a new target-depth estimation method, which uses elevation and arrival time measures from one sonar ping in a multipath environment. This method is based on ray back-propagation with a probabilistic approach. This localization algorithm minimizes the mean-squared error of elevation angles at the receiver and arrival times between a model and measures. This method is tested through Monte-Carlo simulations of classic active sonar scenarios and using experimental data from a real reduced-scaled tank. In active sonar, acoustic waves can take the same path on the way back or another path, so ray path combinations can occur. Our localization method discusses also about this ray identification, or how these combined acoustic paths were managed
    corecore