17 research outputs found

    Learning auditory space: generalization and long-term effects

    Get PDF
    Background: Previous findings have shown that humans can learn to localize with altered auditory space cues. Here we analyze such learning processes and their effects up to one month on both localization accuracy and sound externalization. Subjects were trained and retested, focusing on the effects of stimulus type in learning, stimulus type in localization, stimulus position, previous experience, externalization levels, and time. Method: We trained listeners in azimuth and elevation discrimination in two experiments. Half participated in the azimuth experiment first and half in the elevation first. In each experiment, half were trained in speech sounds and half in white noise. Retests were performed at several time intervals: just after training and one hour, one day, one week and one month later. In a control condition, we tested the effect of systematic retesting over time with post-tests only after training and either one day, one week, or one month later. Results: With training all participants lowered their localization errors. This benefit was still present one month after training. Participants were more accurate in the second training phase, revealing an effect of previous experience on a different task. Training with white noise led to better results than training with speech sounds. Moreover, the training benefit generalized to untrained stimulus-position pairs. Throughout the post-tests externalization levels increased. In the control condition the long-term localization improvement was not lower without additional contact with the trained sounds, but externalization levels were lower. Conclusion: Our findings suggest that humans adapt easily to altered auditory space cues and that such adaptation spreads to untrained positions and sound types. We propose that such learning depends on all available cues, but each cue type might be learned and retrieved differently. The process of localization learning is global, not limited to stimulus-position pairs, and it differs from externalization processes.Foundation for Science and TechnologyFEDE

    Spike-Timing-Based Computation in Sound Localization

    Get PDF
    Spike timing is precise in the auditory system and it has been argued that it conveys information about auditory stimuli, in particular about the location of a sound source. However, beyond simple time differences, the way in which neurons might extract this information is unclear and the potential computational advantages are unknown. The computational difficulty of this task for an animal is to locate the source of an unexpected sound from two monaural signals that are highly dependent on the unknown source signal. In neuron models consisting of spectro-temporal filtering and spiking nonlinearity, we found that the binaural structure induced by spatialized sounds is mapped to synchrony patterns that depend on source location rather than on source signal. Location-specific synchrony patterns would then result in the activation of location-specific assemblies of postsynaptic neurons. We designed a spiking neuron model which exploited this principle to locate a variety of sound sources in a virtual acoustic environment using measured human head-related transfer functions. The model was able to accurately estimate the location of previously unknown sounds in both azimuth and elevation (including front/back discrimination) in a known acoustic environment. We found that multiple representations of different acoustic environments could coexist as sets of overlapping neural assemblies which could be associated with spatial locations by Hebbian learning. The model demonstrates the computational relevance of relative spike timing to extract spatial information about sources independently of the source signal

    Low is large: spatial location and pitch interact in voice-based body size estimation

    Get PDF
    The binding of incongruent cues poses a challenge for multimodal perception. Indeed, although taller objects emit sounds from higher elevations, low-pitched sounds are perceptually mapped both to large size and to low elevation. In the present study, we examined how these incongruent vertical spatial cues (up is more) and pitch cues (low is large) to size interact, and whether similar biases influence size perception along the horizontal axis. In Experiment 1, we measured listeners’ voice-based judgments of human body size using pitch-manipulated voices projected from a high versus a low, and a right versus a left, spatial location. Listeners associated low spatial locations with largeness for lowered-pitch but not for raised-pitch voices, demonstrating that pitch overrode vertical-elevation cues. Listeners associated rightward spatial locations with largeness, regardless of voice pitch. In Experiment 2, listeners performed the task while sitting or standing, allowing us to examine self-referential cues to elevation in size estimation. Listeners associated vertically low and rightward spatial cues with largeness more for lowered- than for raised-pitch voices. These correspondences were robust to sex (of both the voice and the listener) and head elevation (standing or sitting); however, horizontal correspondences were amplified when participants stood. Moreover, when participants were standing, their judgments of how much larger men’s voices sounded than women’s increased when the voices were projected from the low speaker. Our results provide novel evidence for a multidimensional spatial mapping of pitch that is generalizable to human voices and that affects performance in an indirect, ecologically relevant spatial task (body size estimation). These findings suggest that crossmodal pitch correspondences evoke both low-level and higher-level cognitive processes

    A Sparsity-Based Approach to 3D Binaural Sound Synthesis Using Time-Frequency Array Processing

    Get PDF
    Localization of sounds in physical space plays a very important role in multiple audio-related disciplines, such as music, telecommunications, and audiovisual productions. Binaural recording is the most commonly used method to provide an immersive sound experience by means of headphone reproduction. However, it requires a very specific recording setup using high-fidelity microphones mounted in a dummy head. In this paper, we present a novel processing framework for binaural sound recording and reproduction that avoids the use of dummy heads, which is specially suitable for immersive teleconferencing applications. The method is based on a time-frequency analysis of the spatial properties of the sound picked up by a simple tetrahedral microphone array, assuming source sparseness. The experiments carried out using simulations and a real-time prototype confirm the validity of the proposed approach
    corecore