6 research outputs found

    Multisensory Integration Sites Identified by Perception of Spatial Wavelet Filtered Visual Speech Gesture Information

    Get PDF
    Perception of speech is improved when presentation of the audio signal is accompanied by concordant visual speech gesture information. This enhancement is most prevalent when the audio signal is degraded. One potential means by which the brain affords perceptual enhancement is thought to be through the integration of concordant information from multiple sensory channels in a common site of convergence, multisensory integration (MSI) sites. Some studies have identified potential sites in the superior temporal gyrus/sulcus (STG/S) that are responsive to multisensory information from the auditory speech signal and visual speech movement. One limitation of these studies is that they do not control for activity resulting from attentional modulation cued by such things as visual information signaling the onsets and offsets of the acoustic speech signal, as well as activity resulting from MSI of properties of the auditory speech signal with aspects of gross visual motion that are not specific to place of articulation information. This fMRI experiment uses spatial wavelet bandpass filtered Japanese sentences presented with background multispeaker audio noise to discern brain activity reflecting MSI induced by auditory and visual correspondence of place of articulation information that controls for activity resulting from the above-mentioned factors. The experiment consists of a low-frequency (LF) filtered condition containing gross visual motion of the lips, jaw, and head without specific place of articulation information, a midfrequency (MF) filtered condition containing place of articulation information, and an unfiltered (UF) condition. Sites of MSI selectively induced by auditory and visual correspondence of place of articulation information were determined by the presence of activity for both the MF and UF conditions relative to the LF condition. Based on these criteria, sites of MSI were found predominantly in the left middle temporal gyrus (MTG), and the left STG/S (including the auditory cortex). By controlling for additional factors that could also induce greater activity resulting from visual motion information, this study identifies potential MSI sites that we believe are involved with improved speech perception intelligibility

    A Robust Speaking Face Modelling Approach Based on Multilevel Fusion

    Get PDF

    Spontaneous Voice Gender Imitation Abilities in Adult Speakers

    Get PDF
    Background The frequency components of the human voice play a major role in signalling the gender of the speaker. A voice imitation study was conducted to investigate individuals' ability to make behavioural adjustments to fundamental frequency (F0), and formants (Fi) in order to manipulate their expression of voice gender. Methodology/Principal Findings Thirty-two native British-English adult speakers were asked to read out loud different types of text (words, sentence, passage) using their normal voice and then while sounding as ‘masculine’ and ‘feminine’ as possible. Overall, the results show that both men and women raised their F0 and Fi when feminising their voice, and lowered their F0 and Fi when masculinising their voice. Conclusions/Significance These observations suggest that adult speakers are capable of spontaneous glottal and vocal tract length adjustments to express masculinity and femininity in their voice. These results point to a “gender code”, where speakers make a conventionalized use of the existing sex dimorphism to vary the expression of their gender and gender-related attributes

    Expression of gender in the human voice: investigating the “gender code”

    Get PDF
    We can easily and reliably identify the gender of an unfamiliar interlocutor over the telephone. This is because our voice is “sexually dimorphic”: men typically speak with a lower fundamental frequency (F0 - lower pitch) and lower vocal tract resonances (ΔF – “deeper” timbre) than women. While the biological bases of these differences are well understood, and mostly down to size differences between men and women, very little is known about the extent to which we can play with these differences to accentuate or de-emphasise our perceived gender, masculinity and femininity in a range of social roles and contexts. The general aim of this thesis is to investigate the behavioural basis of gender expression in the human voice in both children and adults. More specifically, I hypothesise that, on top of the biologically determined sexual dimorphism, humans use a “gender code” consisting of vocal gestures (global F0 and ΔF adjustments) aimed at altering the gender attributes conveyed by their voice. In order to test this hypothesis, I first explore how acoustic variation of sexually dimorphic acoustic cues (F0 and ΔF) relates to physiological differences in pre-pubertal speakers (vocal tract length) and adult speakers (body height and salivary testosterone levels), and show that voice gender variation cannot be solely explained by static, biologically determined differences in vocal apparatus and body size of speakers. Subsequently, I show that both children and adult speakers can spontaneously modify their voice gender by lowering (raising) F0 and ΔF to masculinise (feminise) their voice, a key ability for the hypothesised control of voice gender. Finally, I investigate the interplay between voice gender expression and social context in relation to cultural stereotypes. I report that listeners spontaneously integrate stereotypical information in the auditory and visual domain to make stereotypical judgments about children’s gender and that adult actors manipulate their gender expression in line with stereotypical gendered notions of homosexuality. Overall, this corpus of data supports the existence of a “gender code” in human nonverbal vocal communication. This “gender code” provides not only a methodological framework with which to empirically investigate variation in voice gender and its role in expressing gender identity, but also a unifying theoretical structure to understand the origins of such variation from both evolutionary and social perspectives

    A system for video-based analysis of face motion during speech

    Get PDF
    During face-to-face interaction, facial motion conveys information at various levels. These include a person's emotional condition, position in a discourse, and, while speaking, phonetic details about the speech sounds being produced. Trivially, the measurement of face motion is a prerequisite for any further analysis of its functional characteristics or information content. It is possible to make precise measures of locations on the face using systems that track the motion by means of active or passive markers placed directly on the face. Such systems, however, have the disadvantages of requiring specialised equipment, thus restricting the use outside the lab, and being invasive in the sense that the markers have to be attached to the subject's face. To overcome these limitations we developed a video-based system to measure face motion from standard video recordings by deforming the surface of an ellipsoidal mesh fit to the face. The mesh is initialised manually for a reference frame and then projected onto subsequent video frames. Location changes (between successive frames) for each mesh node are determined adaptively within a well-defined area around each mesh node, using a two-dimensional cross-correlation analysis on a two-dimensional wavelet transform of the frames. Position parameters are propagated in three steps from a coarser mesh and a correspondingly higher scale of the wavelet transform to the final fine mesh and lower scale of the wavelet transform. The sequential changes in position of the mesh nodes represent the facial motion. The method takes advantage of inherent constraints of the facial surfaces which distinguishes it from more general image motion estimation methods and it returns measurement points globally distributed over the facial surface contrary to feature-based methods

    Video-based face motion measurement

    No full text
    In this paper, we describe and evaluate a noninvasive method of measuring face motion during speech production. Reliable measures are extracted from standard video sequences using an image analysis process that takes advantage of important constraints o
    corecore