3,470 research outputs found

    Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition

    Full text link
    This paper presents a self-supervised method for visual detection of the active speaker in a multi-person spoken interaction scenario. Active speaker detection is a fundamental prerequisite for any artificial cognitive system attempting to acquire language in social settings. The proposed method is intended to complement the acoustic detection of the active speaker, thus improving the system robustness in noisy conditions. The method can detect an arbitrary number of possibly overlapping active speakers based exclusively on visual information about their face. Furthermore, the method does not rely on external annotations, thus complying with cognitive development. Instead, the method uses information from the auditory modality to support learning in the visual domain. This paper reports an extensive evaluation of the proposed method using a large multi-person face-to-face interaction dataset. The results show good performance in a speaker dependent setting. However, in a speaker independent setting the proposed method yields a significantly lower performance. We believe that the proposed method represents an essential component of any artificial cognitive system or robotic platform engaging in social interactions.Comment: 10 pages, IEEE Transactions on Cognitive and Developmental System

    Auditory-visual interaction in computer graphics

    Get PDF
    Generating high-fidelity images in real-time at reasonable frame rates, still remains one of the main challenges in computer graphics. Furthermore, visuals remain only one of the multiple sensory cues that are required to be delivered simultaneously in a multi-sensory virtual environment. The most frequently used sense, besides vision, in virtual environments and entertainment, is audio. While the rendering community focuses on solving the rendering equation more quickly using various algorithmic and hardware improvements, the exploitation of human limitations to assist in this process remain largely unexplored. Many findings in the research literature prove the existence of physical and psychological limitations of humans, including attentional, perceptual and limitations of the Human Sensory System (HSS). Knowledge of the Human Visual System (HVS) may be exploited in computer graphics to significantly reduce rendering times without the viewer being aware of any resultant image quality difference. Furthermore, cross-modal effects, that is the influence of one sensory input on another, for example sound and visuals, have also recently been shown to have a substantial impact on viewer perception of virtual environment. In this thesis, auditory-visual cross-modal interaction research findings have been investigated and adapted to graphics rendering purposes. The results from five psychophysical experiments, involving 233 participants, showed that, even in the realm of computer graphics, there is a strong relationship between vision and audition in both spatial and temporal domains. The first experiment, investigating the auditory-visual cross-modal interaction within spatial domain, showed that unrelated sound effects reduce perceived rendering quality threshold. In the following experiments, the effect of audio on temporal visual perception was investigated. The results obtained indicate that audio with certain beat rates can be used in order to reduce the amount of rendering required to achieve a perceptual high quality. Furthermore, introducing the sound effect of footsteps to walking animations increased the visual smoothness perception. These results suggest that for certain conditions the number of frames that need to be rendered each second can be reduced, saving valuable computation time, without the viewer being aware of this reduction. This is another step towards a comprehensive understanding of auditory-visual cross-modal interaction and its use in high-fidelity interactive multi-sensory virtual environments

    The Sound Design Toolkit

    Get PDF
    The Sound Design Toolkit is a collection of physically informed sound synthesis models, specifically designed for practice and research in Sonic Interaction Design. The collection is based on a hierarchical, perceptually founded taxonomy of everyday sound events, and implemented by procedural audio algorithms which emphasize the role of sound as a process rather than a product. The models are intuitive to control \u2013 and the resulting sounds easy to predict \u2013 as they rely on basic everyday listening experience. Physical descriptions of sound events are intentionally simplified to emphasize the most perceptually relevant timbral features, and to reduce computational requirements as well

    Influence of Auditory Cues on the visually-induced Self-Motion Illusion (Circular Vection) in Virtual Reality

    Get PDF
    This study investigated whether the visually induced selfmotion illusion (“circular vection”) can be enhanced by adding a matching auditory cue (the sound of a fountain that is also visible in the visual stimulus). Twenty observers viewed rotating photorealistic pictures of a market place projected onto a curved projection screen (FOV: 54°x45°). Three conditions were randomized in a repeated measures within-subject design: No sound, mono sound, and spatialized sound using a generic head-related transfer function (HRTF). Adding mono sound increased convincingness ratings marginally, but did not affect any of the other measures of vection or presence. Spatializing the fountain sound, however, improved vection (convincingness and vection buildup time) and presence ratings significantly. Note that facilitation was found even though the visual stimulus was of high quality and realism, and known to be a powerful vection-inducing stimulus. Thus, HRTF-based auralization using headphones can be employed to improve visual VR simulations both in terms of self-motion perception and overall presence

    Perceptually Driven Interactive Sound Propagation for Virtual Environments

    Get PDF
    Sound simulation and rendering can significantly augment a user‘s sense of presence in virtual environments. Many techniques for sound propagation have been proposed that predict the behavior of sound as it interacts with the environment and is received by the user. At a broad level, the propagation algorithms can be classified into reverberation filters, geometric methods, and wave-based methods. In practice, heuristic methods based on reverberation filters are simple to implement and have a low computational overhead, while wave-based algorithms are limited to static scenes and involve extensive precomputation. However, relatively little work has been done on the psychoacoustic characterization of different propagation algorithms, and evaluating the relationship between scientific accuracy and perceptual benefits.In this dissertation, we present perceptual evaluations of sound propagation methods and their ability to model complex acoustic effects for virtual environments. Our results indicate that scientifically accurate methods for reverberation and diffraction do result in increased perceptual differentiation. Based on these evaluations, we present two novel hybrid sound propagation methods that combine the accuracy of wave-based methods with the speed of geometric methods for interactive sound propagation in dynamic scenes.Our first algorithm couples modal sound synthesis with geometric sound propagation using wave-based sound radiation to perform mode-aware sound propagation. We introduce diffraction kernels of rigid objects,which encapsulate the sound diffraction behaviors of individual objects in the free space and are then used to simulate plausible diffraction effects using an interactive path tracing algorithm. Finally, we present a novel perceptual driven metric that can be used to accelerate the computation of late reverberation to enable plausible simulation of reverberation with a low runtime overhead. We highlight the benefits of our novel propagation algorithms in different scenarios.Doctor of Philosoph

    The influence of olfaction on the perception of high-fidelity computer graphics

    Get PDF
    The computer graphics industry is constantly demanding more realistic images and animations. However, producing such high quality scenes can take a long time, even days, if rendering on a single PC. One of the approaches that can be used to speed up rendering times is Visual Perception, which exploits the limitations of the Human Visual System, since the viewers of the results will be humans. Although there is an increasing body of research into how haptics and sound may affect a viewer's perception in a virtual environment, the in uence of smell has been largely ignored. The aim of this thesis is to address this gap and make smell an integral part of multi-modal virtual environments. In this work, we have performed four major experiments, with a total of 840 participants. In the experiments we used still images and animations, related and unrelated smells and finally, a multi-modal environment was considered with smell, sound and temperature. Beside this, we also investigated how long it takes for an average person to adapt to smell and what affect there may be when performing a task in the presence of a smell. The results of this thesis clearly show that a smell present in the environment firstly affects the perception of object quality within a rendered image, and secondly, enables parts of the scene or the whole animation to be selectively rendered in high quality while the rest can be rendered in a lower quality without the viewer noticing the drop in quality. Such selective rendering in the presence of smell results in significant computational performance gains without any loss in the quality of the image or animations perceived by a viewer
    corecore