3,161 research outputs found

    Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression

    Get PDF
    This paper addresses the problem of localizing audio sources using binaural measurements. We propose a supervised formulation that simultaneously localizes multiple sources at different locations. The approach is intrinsically efficient because, contrary to prior work, it relies neither on source separation, nor on monaural segregation. The method starts with a training stage that establishes a locally-linear Gaussian regression model between the directional coordinates of all the sources and the auditory features extracted from binaural measurements. While fixed-length wide-spectrum sounds (white noise) are used for training to reliably estimate the model parameters, we show that the testing (localization) can be extended to variable-length sparse-spectrum sounds (such as speech), thus enabling a wide range of realistic applications. Indeed, we demonstrate that the method can be used for audio-visual fusion, namely to map speech signals onto images and hence to spatially align the audio and visual modalities, thus enabling to discriminate between speaking and non-speaking faces. We release a novel corpus of real-room recordings that allow quantitative evaluation of the co-localization method in the presence of one or two sound sources. Experiments demonstrate increased accuracy and speed relative to several state-of-the-art methods.Comment: 15 pages, 8 figure

    The neurophysiological basis of short- and long-term ventriloquism aftereffects

    Get PDF
    Park H, Kayser C. The neurophysiological basis of short- and long-term ventriloquism aftereffects. bioRxiv. 2020.ABSTRACTOur senses often receive conflicting multisensory information, which our brain reconciles by adaptive recalibration. A classic example is the ventriloquist aftereffect, which emerges following both long-term and trial-wise exposure to spatially discrepant multisensory stimuli. Still, it remains debated whether the behavioral biases observed following short- and long-term exposure arise from largely the same or rather distinct neural origins, and hence reflect the same or distinct mechanisms. We address this question by probing EEG recordings for physiological processes predictive of the single-trial ventriloquism biases following the exposure to spatially offset audio-visual stimuli. Our results support the hypothesis that both short- and long-term aftereffects are mediated by common neurophysiological correlates, which likely arise from sensory and parietal regions involved in multisensory inference and memory, while prolonged exposure to consistent discrepancies additionally recruits prefrontal regions. These results posit a central role of parietal regions in mediating multisensory spatial recalibration and suggest that frontal regions contribute to increasing the behavioral bias when the perceived sensory discrepancy is consistent and persistent over time.</jats:p

    The Speed, Precision and Accuracy of Human Multisensory Perception following Changes to the Visual Sense

    Get PDF
    Human adults can combine information from multiple senses to improve their perceptual judgments. Visual and multisensory experience plays an important role in the development of multisensory integration, however it is unclear to what extent changes in vision impact multisensory processing later in life. In particular, it is not known whether adults account for changes to the relative reliability of their senses, following sensory loss, treatment or training. Using psychophysical methods, this thesis studied the multisensory processing of individuals experiencing changes to the visual sense. Chapters 2 and 3 assessed whether patients implanted with a retinal prosthesis (having been blinded by a retinal degenerative disease) could use this new visual signal with non-visual information to improve their speed or precision on multisensory tasks. Due to large differences between the reliabilities of the visual and non-visual cues, patients were not always able to benefit from the new visual signal. Chapter 4 assessed whether patients with degenerative visual loss adjust the weight given to visual and non-visual cues during audio-visual localization as their relative reliabilities change. Although some patients adjusted their reliance on vision across the visual field in line with predictions based on cue relative reliability, others - patients with visual loss limited to their central visual field only - did not. Chapter 5 assessed whether training with either more reliable or less reliable visual feedback could enable normally sighted adults to overcome an auditory localization bias. Findings suggest that visual information, irrespective of reliability, can be used to overcome at least some non-visual biases. In summary, this thesis documents multisensory changes following changes to the visual sense. The results improve our understanding of adult multisensory plasticity and have implications for successful treatments and rehabilitation following sensory loss

    Ventriloquism effect with sound stimuli varying in both azimuth and elevation

    No full text
    Copyright 2015 Acoustical Society of America. This article may be downloaded for personal use only. Any other use requires prior permission of the author and the Acoustical Society of America.The following article appeared in Etienne Hendrickx, Mathieu Paquier, Vincent Koehl and Julian Palacino, Ventriloquism effect with sound stimuli varying in both azimuth and elevation, The Journal of the Acoustical Society of America 2015, vol. 138, no 6, pp. 3686–3697.and may be found at http://link.aip.org/link/?JAS/138/3686International audienceWhen presented with a spatially discordant auditory-visual stimulus, subjects sometimes perceive the sound and the visual stimuli as coming from the same location. Such a phenomenon is often referred to as perceptual fusion or ventriloquism, as it evokes the illusion created by a ventriloquist when his voice seems to emanate from his puppet rather than from his mouth. While this effect has been extensively examined in the horizontal plane and to a lesser extent in distance, few psychoacoustic studies have focused on elevation. In the present experiment, sequences of a man talking were presented to subjects. His voice could be reproduced on different loudspeakers, which created disparities in both azimuth and elevation between the sound and the visual stimuli. For each presentation, subjects had to indicate whether the voice seemed to emanate from the mouth of the actor or not. Results showed that ventriloquism could be observed with larger audiovisual disparities in elevation than in azimuth

    Spatial sound for computer games and virtual reality

    Get PDF
    In this chapter, we discuss spatial sound within the context of Virtual Reality and other synthetic environments such as computer games. We review current audio technologies, sound constraints within immersive multi-modal spaces, and future trends. The review process takes into consideration the wide-varying levels of audio sophistication in the gaming and VR industries, ranging from standard stereo output to Head Related Transfer Function implementation. The level of sophistication is determined mostly by hardware/system constraints (such as mobile devices or network limitations), however audio practitioners are developing novel and diverse methods to overcome many of these challenges. No matter what approach is employed, the primary objectives are very similar—the enhancement of the virtual scene and the enrichment of the user experience. We discuss how successful various audio technologies are in achieving these objectives, how they fall short, and how they are aligned to overcome these shortfalls in future implementations

    Causal inference in multisensory perception and the brain

    Get PDF
    To build coherent and veridical multisensory representations of the environment, human observers consider the causal structure of multisensory signals: If they infer a common source of the signals, observers integrate them weighted by their reliability. Otherwise, they segregate the signals. Generally, observers infer a common source if the signals correspond structurally and spatiotemporally. In six projects, the current PhD thesis investigated this causal inference model with the help of audiovisual spatial signals presented to human observers in a ventriloquist paradigm. A first psychophysical study showed that sensory reliability determines causal inference via two mechanisms: Sensory reliability modulates how observers infer the causal structure from spatial signal disparity. Further, sensory reliability determines the weight of audiovisual signals if observers integrate the signals under assumption of a common source. Using multivariate decoding of fMRI signals, three PhD projects revealed that auditory and visual cortical hierarchies jointly implement causal inference. Specific regions of the hierarchies represented constituent spatial estimates of the causal inference model. In line with this model, anterior regions of intraparietal sulcus (IPS) represent audiovisual signals dependent on visual reliability, task-relevance, and spatial disparity of the signals. However, even in case of small signal discrepancies suggesting a common source, reliability-weighting in IPS was suboptimal as compared to a Maximum Estimation Likelihood model. By temporally manipulating visual reliability, the fifth PhD project demonstrated that human observers learn sensory reliability from current and past signals in order to weight audiovisual signals, consistent with a Bayesian learner. Finally, the sixth project showed that if visual flashes were rendered unaware by continuous flash suppression, the visual bias of the perceived auditory location was strongly reduced but still significant. The reduced ventriloquist effect was presumably mediated by the drop of visual reliability accompanying perceptual unawareness. In conclusion, the PhD thesis suggests that human observers integrate multisensory signals according to their causal structure and temporal regularity: They integrate the signals if a common source is likely by weighting them proportional to the reliability which they learnt from the signals’ history. Crucially, specific regions of cortical hierarchies jointly implement these multisensory processes

    Using cognitive psychology and neuroscience to better inform sound system design at large musical events

    Get PDF
    Large musical events have become increasingly popular in the last fifty years. It is now not uncommon to have indoor shows in excess of 10,000 people, and open-air events of 30,000 people or more. These events, nevertheless, present technical challenges that have only begun to be solved in the last hundred years, with the introduction of sound reinforcement systems, electric lighting and now video/display technologies. However, these technologies present an artificial link to the performance that requires an understanding of both the audience's expectations as well as the technologies' abilities and limitations. Although many of these abilities and limitations are well documented, the audience's responses to them are less so. This paper introduces research primarily into audience auditory responses but at a subconscious level. By investigating these responses, it is hoped to find a commonality amongst audiences, from which better-informed metrics can be derived

    Engineering data compendium. Human perception and performance. User's guide

    Get PDF
    The concept underlying the Engineering Data Compendium was the product of a research and development program (Integrated Perceptual Information for Designers project) aimed at facilitating the application of basic research findings in human performance to the design and military crew systems. The principal objective was to develop a workable strategy for: (1) identifying and distilling information of potential value to system design from the existing research literature, and (2) presenting this technical information in a way that would aid its accessibility, interpretability, and applicability by systems designers. The present four volumes of the Engineering Data Compendium represent the first implementation of this strategy. This is the first volume, the User's Guide, containing a description of the program and instructions for its use
    • …
    corecore