2,033 research outputs found

    Prominence Driven Character Animation

    Get PDF
    This paper details the development of a fully automated system for character animation implemented in Autodesk Maya. The system uses prioritised speech events to algorithmically generate head, body, arms and leg movements alongside eyeblinks, eyebrow movements and lip-synching. In addition, gaze tracking is also generated automatically relative to the definition of focus objects- contextually important objects in the character\u27s worldview. The plugin uses an animation profile to store the relevant controllers and movements for a specific character, allowing any character to run with the system. Once a profile has been created, an audio file can be loaded and animated with a single button click. The average time to animate is between 2-3 minutes for 1 minute of speech, and the plugin can be used either as a first pass system for high quality work or as part of a batch animation workflow for larger amounts of content as exemplified in television and online dissemination channels

    Continuous Interaction with a Virtual Human

    Get PDF
    Attentive Speaking and Active Listening require that a Virtual Human be capable of simultaneous perception/interpretation and production of communicative behavior. A Virtual Human should be able to signal its attitude and attention while it is listening to its interaction partner, and be able to attend to its interaction partner while it is speaking – and modify its communicative behavior on-the-fly based on what it perceives from its partner. This report presents the results of a four week summer project that was part of eNTERFACE’10. The project resulted in progress on several aspects of continuous interaction such as scheduling and interrupting multimodal behavior, automatic classification of listener responses, generation of response eliciting behavior, and models for appropriate reactions to listener responses. A pilot user study was conducted with ten participants. In addition, the project yielded a number of deliverables that are released for public access

    Surface electromyographic control of a novel phonemic interface for speech synthesis

    Full text link
    Many individuals with minimal movement capabilities use AAC to communicate. These individuals require both an interface with which to construct a message (e.g., a grid of letters) and an input modality with which to select targets. This study evaluated the interaction of two such systems: (a) an input modality using surface electromyography (sEMG) of spared facial musculature, and (b) an onscreen interface from which users select phonemic targets. These systems were evaluated in two experiments: (a) participants without motor impairments used the systems during a series of eight training sessions, and (b) one individual who uses AAC used the systems for two sessions. Both the phonemic interface and the electromyographic cursor show promise for future AAC applications.F31 DC014872 - NIDCD NIH HHS; R01 DC002852 - NIDCD NIH HHS; R01 DC007683 - NIDCD NIH HHS; T90 DA032484 - NIDA NIH HHShttps://www.ncbi.nlm.nih.gov/pubmed/?term=Surface+electromyographic+control+of+a+novel+phonemic+interface+for+speech+synthesishttps://www.ncbi.nlm.nih.gov/pubmed/?term=Surface+electromyographic+control+of+a+novel+phonemic+interface+for+speech+synthesisPublished versio

    Visual prosody in speech-driven facial animation: elicitation, prediction, and perceptual evaluation

    Get PDF
    Facial animations capable of articulating accurate movements in synchrony with a speech track have become a subject of much research during the past decade. Most of these efforts have focused on articulation of lip and tongue movements, since these are the primary sources of information in speech reading. However, a wealth of paralinguistic information is implicitly conveyed through visual prosody (e.g., head and eyebrow movements). In contrast with lip/tongue movements, however, for which the articulation rules are fairly well known (i.e., viseme-phoneme mappings, coarticulation), little is known about the generation of visual prosody. The objective of this thesis is to explore the perceptual contributions of visual prosody in speech-driven facial avatars. Our main hypothesis is that visual prosody driven by acoustics of the speech signal, as opposed to random or no visual prosody, results in more realistic, coherent and convincing facial animations. To test this hypothesis, we have developed an audio-visual system capable of capturing synchronized speech and facial motion from a speaker using infrared illumination and retro-reflective markers. In order to elicit natural visual prosody, a story-telling experiment was designed in which the actors were shown a short cartoon video, and subsequently asked to narrate the episode. From this audio-visual data, four different facial animations were generated, articulating no visual prosody, Perlin-noise, speech-driven movements, and ground truth movements. Speech-driven movements were driven by acoustic features of the speech signal (e.g., fundamental frequency and energy) using rule-based heuristics and autoregressive models. A pair-wise perceptual evaluation shows that subjects can clearly discriminate among the four visual prosody animations. It also shows that speech-driven movements and Perlin-noise, in that order, approach the performance of veridical motion. The results are quite promising and suggest that speech-driven motion could outperform Perlin-noise if more powerful motion prediction models are used. In addition, our results also show that exaggeration can bias the viewer to perceive a computer generated character to be more realistic motion-wise

    English speaking proficiency assessment using speech and electroencephalography signals

    Get PDF
    In this paper, the English speaking proficiency level of non-native English speakerswas automatically estimated as high, medium, or low performance. For this purpose, the speech of 142 non-native English speakers was recorded and electroencephalography (EEG) signals of 58 of them were recorded while speaking in English. Two systems were proposed for estimating the English proficiency level of the speaker; one used 72 audio features, extracted from speech signals, and the other used 112 features extracted from EEG signals. Multi-class support vector machines (SVM) was used for training and testing both systems using a cross-validation strategy. The speech-based system outperformed the EEG system with 68% accuracy on 60 testing audio recordings, compared with 56% accuracy on 30 testing EEG recordings

    Speech & Multimodal Resources: the Herme Database of Spontaneous Multimodal Human-Robot Dialogues

    Get PDF
    This paper presents methodologies and tools for language resource (LR) construction. It describes a database of interactive speech collected over a three-month period at the Science Gallery in Dublin, where visitors could take part in a conversation with a robot. The system collected samples of informal, chatty dialogue – normally difficult to capture under laboratory conditions for human-human dialogue, and particularly so for human-machine interaction. The conversations were based on a script followed by the robot consisting largely of social chat with some task-based elements. The interactions were audio-visually recorded using several cameras together with microphones. As part of the conversation the participants were asked to sign a consent form giving permission to use their data for human-machine interaction research. The multimodal corpus will be made available to interested researchers and the technology developed during the three-month exhibition is being extended for use in education and assisted-living applications
    corecore