Search CORE

6,834 research outputs found

Enhancing Expressiveness of Speech through Animated Avatars for Instant Messaging and Mobile Phones

Author: Pechter Joseph E
Publication venue: Dartmouth Digital Commons
Publication date: 01/05/2004
Field of study

This thesis aims to create a chat program that allows users to communicate via an animated avatar that provides believable lip-synchronization and expressive emotion. Currently many avatars do not attempt to do lip-synchronization. Those that do are not well synchronized and have little or no emotional expression. Most avatars with lip synch use realistic looking 3D models or stylized rendering of complex models. This work utilizes images rendered in a cartoon style and lip-synchronization rules based on traditional animation. The cartoon style, as opposed to a more realistic look, makes the mouth motion more believable and the characters more appealing. The cartoon look and image-based animation (as opposed to a graphic model animated through manipulation of a skeleton or wireframe) also allows for fewer key frames resulting in faster speed with more room for expressiveness. When text is entered into the program, the Festival Text-to-Speech engine creates a speech file and extracts phoneme and phoneme duration data. Believable and fluid lip-synchronization is then achieved by means of a number of phoneme-to-image rules. Alternatively, phoneme and phoneme duration data can be obtained for speech dictated into a microphone using Microsoft SAPI and the CSLU Toolkit. Once lip synchronization has been completed, rules for non-verbal animation are added. Emotions are appended to the animation of speech in two ways: automatically, by recognition of key words and punctuation, or deliberately, by user-defined tags. Additionally, rules are defined for idle-time animation. Preliminary results indicate that the animated avatar program offers an improvement over currently available software. It aids in the understandability of speech, combines easily recognizable and expressive emotions with speech, and successfully enhances overall enjoyment of the chat experience. Applications for the program include use in cell phones for the deaf or hearing impaired, instant messaging, video conferencing, instructional software, and speech and animation synthesis

Dartmouth Digital Commons (Dartmouth College)

Lip syncing method for realistic expressive 3D face model

Author: Ali IR
Alkawaz MH
Kolivand H
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Lip synchronization of 3D face model is now being used in a multitude of important fields. It brings a more human, social and dramatic reality to computer games, films and interactive multimedia, and is growing in use and importance. High level of realism can be used in demanding applications such as computer games and cinema. Authoring lip syncing with complex and subtle expressions is still difficult and fraught with problems in terms of realism. This research proposed a lip syncing method of realistic expressive 3D face model. Animated lips requires a 3D face model capable of representing the myriad shapes the human face experiences during speech and a method to produce the correct lip shape at the correct time. The paper presented a 3D face model designed to support lip syncing that align with input audio file. It deforms using Raised Cosine Deformation (RCD) function that is grafted onto the input facial geometry. The face model was based on MPEG-4 Facial Animation (FA) Standard. This paper proposed a method to animate the 3D face model over time to create animated lip syncing using a canonical set of visemes for all pairwise combinations of a reduced phoneme set called ProPhone. The proposed research integrated emotions by the consideration of Ekman model and Plutchik’s wheel with emotive eye movements by implementing Emotional Eye Movements Markup Language (EEMML) to produce realistic 3D face model. © 2017 Springer Science+Business Media New Yor

LJMU Research Online (Liverpool John Moores University)

An Emotional Talking Head for a Humoristic Chatbot

Author: Agnese Augello
Giovanni Pilato
Orazio Gambino
Roberto Pirrone
Salvatore Gaglio
Vincenzo Cannella
Publication venue: 'IntechOpen'
Publication date: 01/01/2011
Field of study

IntechOpen

Crossref

Archivio istituzionale della ricerca - Università di Palermo

Generation of realistic human behaviour

Author: Vougioukas Konstantinos
Publication venue: Computing, Imperial College London
Publication date: 01/08/2022
Field of study

As the use of computers and robots in our everyday lives increases so does the need for better interaction with these devices. Human-computer interaction relies on the ability to understand and generate human behavioural signals such as speech, facial expressions and motion. This thesis deals with the synthesis and evaluation of such signals, focusing not only on their intelligibility but also on their realism. Since these signals are often correlated, it is common for methods to drive the generation of one signal using another. The thesis begins by tackling the problem of speech-driven facial animation and proposing models capable of producing realistic animations from a single image and an audio clip. The goal of these models is to produce a video of a target person, whose lips move in accordance with the driving audio. Particular focus is also placed on a) generating spontaneous expression such as blinks, b) achieving audio-visual synchrony and c) transferring or producing natural head motion. The second problem addressed in this thesis is that of video-driven speech reconstruction, which aims at converting a silent video into waveforms containing speech. The method proposed for solving this problem is capable of generating intelligible and accurate speech for both seen and unseen speakers. The spoken content is correctly captured thanks to a perceptual loss, which uses features from pre-trained speech-driven animation models. The ability of the video-to-speech model to run in real-time allows its use in hearing assistive devices and telecommunications. The final work proposed in this thesis is a generic domain translation system, that can be used for any translation problem including those mapping across different modalities. The framework is made up of two networks performing translations in opposite directions and can be successfully applied to solve diverse sets of translation problems, including speech-driven animation and video-driven speech reconstruction.Open Acces

Spiral - Imperial College Digital Repository

About face, computergraphic synthesis and manipulation of facial imagery

Author: Weil Peggy
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1982
Field of study

Thesis (M.S.V.S.)--Massachusetts Institute of Technology, Dept. of Architecture, 1982.MICROFICHE COPY AVAILABLE IN ARCHIVES AND ROTCH. VIDEODISC IN ARCHIVES AND ROTCH VISUAL COLLECTIONS.Includes bibliographical references (leaves 87-90).A technique of pictorially synthesizing facial imagery using optical videodiscs under computer control is described. Search, selection and averaging processes are performed on a catalogue of whole faces and facial features to yield a composite, expressive, recognizable face. An immediate application of this technique is the reconstruction of a particular face from memory for police identification, thus the project is called , IDENTIDISC. Part I-PACEMAKER describes the production and implementation of the IDENTIDISC system to produce composite faces. Part II-EXPRESSIONMAKER describes animation techniques to add expression and motion to composite faces . Expression sequences are manipulated to make 'anyface' make any face. Historical precedents of making facial composites, theories of facial recognition, classification and expression are also discussed. This thesis is accompanied by two copies of PACEMAKER-III, an optical videodisc produced at the Architecture Machine Group in 1982. The disc can be played on an optical videodisc player . The length is approximately 15 , 0000 frames. Frame numbers are indicated in the text by [ ].by Peggy Weil.M.S.V.S

DSpace@MIT

The Processing of Emotional Sentences by Young and Older Adults: A Visual World Eye-movement Study

Author: Carminati Maria Nella
Knoeferle Pia
Publication venue
Publication date: 01/01/2012
Field of study

Carminati MN, Knoeferle P. The Processing of Emotional Sentences by Young and Older Adults: A Visual World Eye-movement Study. Presented at the Architectures and Mechanisms of Language and Processing (AMLaP), Riva del Garda, Italy

Publications at Bielefeld University

The content of auditory feedback to human early visual cortex and its impact on visual perception

Author: Pollicina Giusi
Publication venue
Publication date: 23/05/2023
Field of study

Royal Holloway - Pure

Improving coarticulation performance of 3D avatar and gaze estimation using RGB webcam

Author: Gu Kuangxiao
Publication venue
Publication date: 01/08/2016
Field of study

This thesis explores two applications of computer vision in psychology-related studies: enhanced patient portal messages using 3D avatar and gaze estimation using a single RGB camera. The first application aims to help patients, especially those with poor health and low medical literacy, to understand messages delivered by patient portal systems by enhancing the messages with a 3D avatar. The avatar is built from real human face images and can deliver both semantic and emotive information, the latter of which is expected to help the patients to get a better, gist-level understanding of the portal messages. The second application aims to estimate eye gaze direction with an RGB camera. Preliminary results show the potential of the proposed method, although rigorous quantitative evaluation still needs to be done. While the proposed method cannot achieve the resolution and accuracy of commercial eye trackers, it is able to greatly reduce the cost since only one RGB camera is required

Illinois Digital Environment for Access to Learning and Scholarship Repository

An Actor-Centric Approach to Facial Animation Control by Neural Networks For Non-Player Characters in Video Games

Author: Schiffer Sheldon
Publication venue: ScholarWorks @ Georgia State University
Publication date: 01/05/2023
Field of study

Game developers increasingly consider the degree to which character animation emulates facial expressions found in cinema. Employing animators and actors to produce cinematic facial animation by mixing motion capture and hand-crafted animation is labor intensive and therefore expensive. Emotion corpora and neural network controllers have shown promise toward developing autonomous animation that does not rely on motion capture. Previous research and practice in disciplines of Computer Science, Psychology and the Performing Arts have provided frameworks on which to build a workflow toward creating an emotion AI system that can animate the facial mesh of a 3d non-player character deploying a combination of related theories and methods. However, past investigations and their resulting production methods largely ignore the emotion generation systems that have evolved in the performing arts for more than a century. We find very little research that embraces the intellectual process of trained actors as complex collaborators from which to understand and model the training of a neural network for character animation. This investigation demonstrates a workflow design that integrates knowledge from the performing arts and the affective branches of the social and biological sciences. Our workflow begins at the stage of developing and annotating a fictional scenario with actors, to producing a video emotion corpus, to designing training and validating a neural network, to analyzing the emotion data annotation of the corpus and neural network, and finally to determining resemblant behavior of its autonomous animation control of a 3d character facial mesh. The resulting workflow includes a method for the development of a neural network architecture whose initial efficacy as a facial emotion expression simulator has been tested and validated as substantially resemblant to the character behavior developed by a human actor

ScholarWorks @ Georgia State University

Animation of a hierarchical image based facial model and perceptual analysis of visual speech

Author: Cosker Darren
Publication venue
Publication date: 01/01/2005
Field of study

In this Thesis a hierarchical image-based 2D talking head model is presented, together with robust automatic and semi-automatic animation techniques, and a novel perceptual method for evaluating visual-speech based on the McGurk effect. The novelty of the hierarchical facial model stems from the fact that sub-facial areas are modelled individually. To produce a facial animation, animations for a set of chosen facial areas are first produced, either by key-framing sub-facial parameter values, or using a continuous input speech signal, and then combined into a full facial output. Modelling hierarchically has several attractive qualities. It isolates variation in sub-facial regions from the rest of the face, and therefore provides a high degree of control over different facial parts along with meaningful image based animation parameters. The automatic synthesis of animations may be achieved using speech not originally included in the training set. The model is also able to automatically animate pauses, hesitations and non-verbal (or non-speech related) sounds and actions. To automatically produce visual-speech, two novel analysis and synthesis methods are proposed. The first method utilises a Speech-Appearance Model (SAM), and the second uses a Hidden Markov Coarticulation Model (HMCM) - based on a Hidden Markov Model (HMM). To evaluate synthesised animations (irrespective of whether they are rendered semi automatically, or using speech), a new perceptual analysis approach based on the McGurk effect is proposed. This measure provides both an unbiased and quantitative method for evaluating talking head visual speech quality and overall perceptual realism. A combination of this new approach, along with other objective and perceptual evaluation techniques, are employed for a thorough evaluation of hierarchical model animations.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

OpenGrey Repository