3,555 research outputs found

    Relating Objective and Subjective Performance Measures for AAM-based Visual Speech Synthesizers

    Get PDF
    We compare two approaches for synthesizing visual speech using Active Appearance Models (AAMs): one that utilizes acoustic features as input, and one that utilizes a phonetic transcription as input. Both synthesizers are trained using the same data and the performance is measured using both objective and subjective testing. We investigate the impact of likely sources of error in the synthesized visual speech by introducing typical errors into real visual speech sequences and subjectively measuring the perceived degradation. When only a small region (e.g. a single syllable) of ground-truth visual speech is incorrect we find that the subjective score for the entire sequence is subjectively lower than sequences generated by our synthesizers. This observation motivates further consideration of an often ignored issue, which is to what extent are subjective measures correlated with objective measures of performance? Significantly, we find that the most commonly used objective measures of performance are not necessarily the best indicator of viewer perception of quality. We empirically evaluate alternatives and show that the cost of a dynamic time warp of synthesized visual speech parameters to the respective ground-truth parameters is a better indicator of subjective quality

    Jaw Rotation in Dysarthria Measured With a Single Electromagnetic Articulography Sensor

    Get PDF
    Purpose This study evaluated a novel method for characterizing jaw rotation using orientation data from a single electromagnetic articulography sensor. This method was optimized for clinical application, and a preliminary examination of clinical feasibility and value was undertaken. Method The computational adequacy of the single-sensor orientation method was evaluated through comparisons of jaw-rotation histories calculated from dual-sensor positional data for 16 typical talkers. The clinical feasibility and potential value of single-sensor jaw rotation were assessed through comparisons of 7 talkers with dysarthria and 19 typical talkers in connected speech. Results The single-sensor orientation method allowed faster and safer participant preparation, required lower data-acquisition costs, and generated less high-frequency artifact than the dual-sensor positional approach. All talkers with dysarthria, regardless of severity, demonstrated jaw-rotation histories with more numerous changes in movement direction and reduced smoothness compared with typical talkers. Conclusions Results suggest that the single-sensor orientation method for calculating jaw rotation during speech is clinically feasible. Given the preliminary nature of this study and the small participant pool, the clinical value of such measures remains an open question. Further work must address the potential confound of reduced speaking rate on movement smoothness

    Articulatory features for speech-driven head motion synthesis

    Get PDF
    This study investigates the use of articulatory features for speech-driven head motion synthesis as opposed to prosody features such as F0 and energy that have been mainly used in the literature. In the proposed approach, multi-stream HMMs are trained jointly on the synchronous streams of speech and head motion data. Articulatory features can be regarded as an intermediate parametrisation of speech that are expected to have a close link with head movement. Measured head and articulatory movements acquired by EMA were synchronously recorded with speech. Measured articulatory data was compared to those predicted from speech using an HMM-based inversion mapping system trained in a semi-supervised fashion. Canonical correlation analysis (CCA) on a data set of free speech of 12 people shows that the articulatory features are more correlated with head rotation than prosodic and/or cepstral speech features. It is also shown that the synthesised head motion using articulatory features gave higher correlations with the original head motion than when only prosodic features are used. Index Terms: head motion synthesis, articulatory features, canonical correlation analysis, acoustic-to-articulatory mappin

    Eye movements track prioritized auditory features in selective attention to natural speech

    Full text link
    Over the last decades, cognitive neuroscience has identified a distributed set of brain regions that are critical for attention - one of the key principles of adaptive behavior. A strong anatomical overlap with brain regions critical for oculomotor processes suggests a joint network for attention and eye movements. However, the role of this shared network in complex, naturalistic environments remains understudied. Here, we investigated eye movements in relation to (un)attended sentences of natural speech in simultaneously recorded eye tracking and magnetoencephalographic (MEG) data. Using temporal response functions (TRF), we show that eye gaze tracks acoustic features (envelope and acoustic onsets) of attended speech, a phenomenon we termedocular speech tracking. Ocular speech envelope tracking even differentiates a target from a distractor in a multi speaker context and is further related to intelligibility. Moreover, we provide evidence for its contribution to neural differences in speech processing, emphasizing the necessity to consider oculomotor activity in future research and in the interpretation of neural differences in auditory cognition. Our results extend previous findings of a joint network of attention and eye movement control as well as motor theories of speech. They provide valuable new directions for research into the neurobiological mechanisms of the phenomenon, its dependence on learning and plasticity, and its functional implications in social communication

    Contributions of local speech encoding and functional connectivity to audio-visual speech perception

    Get PDF
    Seeing a speaker’s face enhances speech intelligibility in adverse environments. We investigated the underlying network mechanisms by quantifying local speech representations and directed connectivity in MEG data obtained while human participants listened to speech of varying acoustic SNR and visual context. During high acoustic SNR speech encoding by temporally entrained brain activity was strong in temporal and inferior frontal cortex, while during low SNR strong entrainment emerged in premotor and superior frontal cortex. These changes in local encoding were accompanied by changes in directed connectivity along the ventral stream and the auditory-premotor axis. Importantly, the behavioral benefit arising from seeing the speaker’s face was not predicted by changes in local encoding but rather by enhanced functional connectivity between temporal and inferior frontal cortex. Our results demonstrate a role of auditory-frontal interactions in visual speech representations and suggest that functional connectivity along the ventral pathway facilitates speech comprehension in multisensory environments

    Speech and music therapy in the treatment of CAS:An introduction and a case study

    Get PDF
    PurposeSpeech-Music Therapy for Aphasia (SMTA), a method that combines speech therapy and music therapy, is introduced as a treatment method for childhood apraxia of speech (CAS). SMTA will be evaluated in a proof-of-principle study. The first case study is presented herein.MethodSMTA was evaluated in a study with a single-subject experimental design comparing 10 weeks of treatment with two months of no treatment. The research protocol included a pre-test, baseline phase, treatment phase, post-test, no-treatment phase and follow- up test. The participant was a five years and eight months old boy with CAS. Outcome measures were selected to reflect both intelligibility in daily communication, as well as features of CAS and speech motor planning and programming.ResultsResults on the Intelligibility in Context Scale-Dutch (ICS-Dutch) and in the analysis of a spontaneous speech sample suggest generalization of treatment effects. Improvements were found in measures that reflect complex speech motor skills, that is, the production of consonant clusters and consistency.ConclusionThis case study showed that speech production of the participant improved aftertreatment with SMTA. Although intelligibility as measured with the ICS-Dutch improved over the study period, objectifying changes at the level of intelligibility in daily communication proved to be difficult. Additional measures may be necessary to gain more insight into treatment effects at this level
    corecore