19,044 research outputs found

    Smoothness perception : investigation of beat rate effect on frame rate perception

    Get PDF
    Despite the complexity of the Human Visual System (HVS), research over the last few decades has highlighted a number of its limitations. These limitations can be exploited in computer graphics to significantly reduce computational cost and thus required rendering time, without a viewer perceiving any difference in resultant image quality. Furthermore, cross-modal interaction between different modalities, such as the influence of audio on visual perception, has also been shown as significant both in psychology and computer graphics. In this paper we investigate the effect of beat rate on temporal visual perception, i.e. frame rate perception. For the visual quality and perception evaluation, a series of psychophysical experiments was conducted and the data analysed. The results indicate that beat rates in some cases do affect temporal visual perception and that certain beat rates can be used in order to reduce the amount of rendering required to achieve a perceptual high quality. This is another step towards a comprehensive understanding of auditory-visual cross-modal interaction and could be potentially used in high-fidelity interactive multi-sensory virtual environments

    Introduction: The Fourth International Workshop on Epigenetic Robotics

    Get PDF
    As in the previous editions, this workshop is trying to be a forum for multi-disciplinary research ranging from developmental psychology to neural sciences (in its widest sense) and robotics including computational studies. This is a two-fold aim of, on the one hand, understanding the brain through engineering embodied systems and, on the other hand, building artificial epigenetic systems. Epigenetic contains in its meaning the idea that we are interested in studying development through interaction with the environment. This idea entails the embodiment of the system, the situatedness in the environment, and of course a prolonged period of postnatal development when this interaction can actually take place. This is still a relatively new endeavor although the seeds of the developmental robotics community were already in the air since the nineties (Berthouze and Kuniyoshi, 1998; Metta et al., 1999; Brooks et al., 1999; Breazeal, 2000; Kozima and Zlatev, 2000). A few had the intuition – see Lungarella et al. (2003) for a comprehensive review – that, intelligence could not be possibly engineered simply by copying systems that are “ready made” but rather that the development of the system fills a major role. This integration of disciplines raises the important issue of learning on the multiple scales of developmental time, that is, how to build systems that eventually can learn in any environment rather than program them for a specific environment. On the other hand, the hope is that robotics might become a new tool for brain science similarly to what simulation and modeling have become for the study of the motor system. Our community is still pretty much evolving and “under construction” and for this reason, we tried to encourage submissions from the psychology community. Additionally, we invited four neuroscientists and no roboticists for the keynote lectures. We received a record number of submissions (more than 50), and given the overall size and duration of the workshop together with our desire to maintain a single-track format, we had to be more selective than ever in the review process (a 20% acceptance rate on full papers). This is, if not an index of quality, at least an index of the interest that gravitates around this still new discipline

    Changes in the McGurk Effect Across Phonetic Contexts

    Full text link
    To investigate the process underlying audiovisual speech perception, the McGurk illusion was examined across a range of phonetic contexts. Two major changes were found. First, the frequency of illusory /g/ fusion percepts increased relative to the frequency of illusory /d/ fusion percepts as vowel context was shifted from /i/ to /a/ to /u/. This trend could not be explained by biases present in perception of the unimodal visual stimuli. However, the change found in the McGurk fusion effect across vowel environments did correspond systematically with changes in second format frequency patterns across contexts. Second, the order of consonants in illusory combination percepts was found to depend on syllable type. This may be due to differences occuring across syllable contexts in the timecourses of inputs from the two modalities as delaying the auditory track of a vowel-consonant stimulus resulted in a change in the order of consonants perceived. Taken together, these results suggest that the speech perception system either fuses audiovisual inputs into a visually compatible percept with a similar second formant pattern to that of the acoustic stimulus or interleaves the information from different modalities, at a phonemic or subphonemic level, based on their relative arrival times.National Institutes of Health (R01 DC02852

    Studies in modal density – its effect at low frequencies

    Get PDF
    The ability to objectively measure the reproduction quality of a small room at low frequencies has long been desired. Over many years, there have been attempts to produce recommendations, metrics, and criteria by which to define a particular room. These have often concentrated on some aspect of the modal distribution, such as spacing or density. Other attempts have focused upon the deviation from a desired frequency response. Whilst the subjective validity of objective measures such as these has often been questioned, the notion that a transitional region between a modal and diffuse sound fields exists, dependant on the room volume and reverberation time continues to permeate much thinking. The calculation of this transitional frequency relies on the calculation of a desired modal density. In the case of the most well known definition, the Schroeder Frequency1, the transitional frequency is that point where the density becomes sufficient that three modes lie within one bandwidth. Although this idea may well be useful in some instances, such as defining points for the use of statistical sound field analysis, recent thought has cast some doubt over its relevance as a subjective frequency above which we may ignore modal issues2. This paper highlights a number of studies along with a new listening test, which help us to better understand the role of modal density upon subjective perception of modal soundfields

    Quality-controlled audio-visual depth in stereoscopic 3D media

    Get PDF
    BACKGROUND: The literature proposes several algorithms that produce “quality-controlled” stereoscopic depth in 3D films by limiting the stereoscopic depth to a defined depth budget. Like stereoscopic displays, spatial sound systems provide the listener with enhanced (auditory) depth cues, and are now commercially available in multiple forms. AIM: We investigate the implications of introducing auditory depth cues to quality-controlled 3D media, by asking: “Is it important to quality-control audio-visual depth by considering audio-visual interactions, when integrating stereoscopic display and spatial sound systems?” MOTIVATION: There are several reports in literature of such “audio-visual interactions”, in which visual and auditory perception influence each other. We seek to answer our research question by investigating whether these audio-visual interactions could extend the depth budget used in quality-controlled 3D media. METHOD/CONCLUSIONS: The related literature is reviewed before presenting four novel experiments that build upon each other’s conclusions. In the first experiment, we show that content created with a stereoscopic depth budget creates measurable positive changes in audiences’ attitude towards 3D films. These changes are repeatable for different locations, displays and content. In the second experiment we calibrate an audio-visual display system and use it to measure the minimum audible depth difference. Our data is used to formulate recommendations for content designers and systems engineers. These recommendations include the design of an auditory depth perception screening test. We then show that an auditory-visual stimulus with a nearer auditory depth is perceived as nearer. We measure the impact of this effect upon a relative depth judgement, and investigate how the impact varies with audio-visual depth separation. Finally, the size of the cross-modal bias in depth is measured, from which we conclude that sound does have the potential to extend the depth budget by a small, but perceivable, amount

    Auditory-visual interaction in computer graphics

    Get PDF
    Generating high-fidelity images in real-time at reasonable frame rates, still remains one of the main challenges in computer graphics. Furthermore, visuals remain only one of the multiple sensory cues that are required to be delivered simultaneously in a multi-sensory virtual environment. The most frequently used sense, besides vision, in virtual environments and entertainment, is audio. While the rendering community focuses on solving the rendering equation more quickly using various algorithmic and hardware improvements, the exploitation of human limitations to assist in this process remain largely unexplored. Many findings in the research literature prove the existence of physical and psychological limitations of humans, including attentional, perceptual and limitations of the Human Sensory System (HSS). Knowledge of the Human Visual System (HVS) may be exploited in computer graphics to significantly reduce rendering times without the viewer being aware of any resultant image quality difference. Furthermore, cross-modal effects, that is the influence of one sensory input on another, for example sound and visuals, have also recently been shown to have a substantial impact on viewer perception of virtual environment. In this thesis, auditory-visual cross-modal interaction research findings have been investigated and adapted to graphics rendering purposes. The results from five psychophysical experiments, involving 233 participants, showed that, even in the realm of computer graphics, there is a strong relationship between vision and audition in both spatial and temporal domains. The first experiment, investigating the auditory-visual cross-modal interaction within spatial domain, showed that unrelated sound effects reduce perceived rendering quality threshold. In the following experiments, the effect of audio on temporal visual perception was investigated. The results obtained indicate that audio with certain beat rates can be used in order to reduce the amount of rendering required to achieve a perceptual high quality. Furthermore, introducing the sound effect of footsteps to walking animations increased the visual smoothness perception. These results suggest that for certain conditions the number of frames that need to be rendered each second can be reduced, saving valuable computation time, without the viewer being aware of this reduction. This is another step towards a comprehensive understanding of auditory-visual cross-modal interaction and its use in high-fidelity interactive multi-sensory virtual environments

    Dissociating task difficulty from incongruence in face-voice emotion integration

    Get PDF
    In the everyday environment, affective information is conveyed by both the face and the voice. Studies have demonstrated that a concurrently presented voice can alter the way that an emotional face expression is perceived, and vice versa, leading to emotional conflict if the information in the two modalities is mismatched. Additionally, evidence suggests that incongruence of emotional valence activates cerebral networks involved in conflict monitoring and resolution. However, it is currently unclear whether this is due to task difficulty—that incongruent stimuli are harder to categorize—or simply to the detection of mismatching information in the two modalities. The aim of the present fMRI study was to examine the neurophysiological correlates of processing incongruent emotional information, independent of task difficulty. Subjects were scanned while judging the emotion of face-voice affective stimuli. Both the face and voice were parametrically morphed between anger and happiness and then paired in all audiovisual combinations, resulting in stimuli each defined by two separate values: the degree of incongruence between the face and voice, and the degree of clarity of the combined face-voice information. Due to the specific morphing procedure utilized, we hypothesized that the clarity value, rather than incongruence value, would better reflect task difficulty. Behavioral data revealed that participants integrated face and voice affective information, and that the clarity, as opposed to incongruence value correlated with categorization difficulty. Cerebrally, incongruence was more associated with activity in the superior temporal region, which emerged after task difficulty had been accounted for. Overall, our results suggest that activation in the superior temporal region in response to incongruent information cannot be explained simply by task difficulty, and may rather be due to detection of mismatching information between the two modalities

    Contributions of local speech encoding and functional connectivity to audio-visual speech perception

    Get PDF
    Seeing a speaker’s face enhances speech intelligibility in adverse environments. We investigated the underlying network mechanisms by quantifying local speech representations and directed connectivity in MEG data obtained while human participants listened to speech of varying acoustic SNR and visual context. During high acoustic SNR speech encoding by temporally entrained brain activity was strong in temporal and inferior frontal cortex, while during low SNR strong entrainment emerged in premotor and superior frontal cortex. These changes in local encoding were accompanied by changes in directed connectivity along the ventral stream and the auditory-premotor axis. Importantly, the behavioral benefit arising from seeing the speaker’s face was not predicted by changes in local encoding but rather by enhanced functional connectivity between temporal and inferior frontal cortex. Our results demonstrate a role of auditory-frontal interactions in visual speech representations and suggest that functional connectivity along the ventral pathway facilitates speech comprehension in multisensory environments

    The effect of visual stimuli on the horribleness of awful sounds.

    Get PDF
    A mass web-based experiment has been carried out to explore people’s perception of horrible sounds. The advantage of a web-based methodology is that it enables hundreds of thousands of judgements to be obtained over a diverse population. As part of the project, the effect of what people saw on the screen on how they rated the sounds was examined. The sounds were auditioned with images that were either associated or unassociated with the sounds. It was found that images often affected how horrible the sound was perceived to be. For example, the image of finger nails on a blackboard made the associated sound more awful. However, in the case of disgusting sounds, such as the sound of someone eating, the images used had no significant effect on voting behaviour. The colour of the website was also varied. The hue of the website was found to be a significant factor, with a red website making the sounds less horrible than a blue/green website. The brightness and saturation of the website also altered people’s perceptions, with the brighter, more saturated website making the most awful sounds, such as the sound of someone vomiting, less horrible

    Electronic Dance Music in Narrative Film

    Get PDF
    As a growing number of filmmakers are moving away from the traditional model of orchestral underscoring in favor of a more contemporary approach to film sound, electronic dance music (EDM) is playing an increasingly important role in current soundtrack practice. With a focus on two specific examples, Tom Tykwer’s Run Lola Run (1998) and Darren Aronofsky’s Pi (1998), this essay discusses the possibilities that such a distinctive aesthetics brings to filmmaking, especially with regard to audiovisual rhythm and sonic integration
    • …
    corecore