399 research outputs found

    Enhancing Expressiveness of Speech through Animated Avatars for Instant Messaging and Mobile Phones

    Get PDF
    This thesis aims to create a chat program that allows users to communicate via an animated avatar that provides believable lip-synchronization and expressive emotion. Currently many avatars do not attempt to do lip-synchronization. Those that do are not well synchronized and have little or no emotional expression. Most avatars with lip synch use realistic looking 3D models or stylized rendering of complex models. This work utilizes images rendered in a cartoon style and lip-synchronization rules based on traditional animation. The cartoon style, as opposed to a more realistic look, makes the mouth motion more believable and the characters more appealing. The cartoon look and image-based animation (as opposed to a graphic model animated through manipulation of a skeleton or wireframe) also allows for fewer key frames resulting in faster speed with more room for expressiveness. When text is entered into the program, the Festival Text-to-Speech engine creates a speech file and extracts phoneme and phoneme duration data. Believable and fluid lip-synchronization is then achieved by means of a number of phoneme-to-image rules. Alternatively, phoneme and phoneme duration data can be obtained for speech dictated into a microphone using Microsoft SAPI and the CSLU Toolkit. Once lip synchronization has been completed, rules for non-verbal animation are added. Emotions are appended to the animation of speech in two ways: automatically, by recognition of key words and punctuation, or deliberately, by user-defined tags. Additionally, rules are defined for idle-time animation. Preliminary results indicate that the animated avatar program offers an improvement over currently available software. It aids in the understandability of speech, combines easily recognizable and expressive emotions with speech, and successfully enhances overall enjoyment of the chat experience. Applications for the program include use in cell phones for the deaf or hearing impaired, instant messaging, video conferencing, instructional software, and speech and animation synthesis

    Avatar augmented online conversation

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2003.Includes bibliographical references (p. 167-175).One of the most important roles played by technology is connecting people and mediating their communication with one another. Building technology that mediates conversation presents a number of challenging research and design questions. Apart from the fundamental issue of what exactly gets mediated, two of the more crucial questions are how the person being mediated interacts with the mediating layer and how the receiving person experiences the mediation. This thesis is concerned with both of these questions and proposes a theoretical framework of mediated conversation by means of automated avatars. This new approach relies on a model of face-to-face conversation, and derives an architecture for implementing these features through automation. First the thesis describes the process of face-to-face conversation and what nonverbal behaviors contribute to its success. It then presents a theoretical framework that explains how a text message can be automatically analyzed in terms of its communicative function based on discourse context, and how behaviors, shown to support those same functions in face-to-face conversation, can then be automatically performed by a graphical avatar in synchrony with the message delivery. An architecture, Spark, built on this framework demonstrates the approach in an actual system design that introduces the concept of a message transformation pipeline, abstracting function from behavior, and the concept of an avatar agent, responsible for coordinated delivery and continuous maintenance of the communication channel. A derived application, MapChat, is an online collaboration system where users represented by avatars in a shared virtual environment can chat and manipulate an interactive map while their avatars generate face-to-face behaviors.(cont.) A study evaluating the strength of the approach compares groups collaborating on a route-planning task using MapChat with and without the animated avatars. The results show that while task outcome was equally good for both groups, the group using these avatars felt that the task was significantly less difficult, and the feeling of efficiency and consensus were significantly stronger. An analysis of the conversation transcripts shows a significant improvement of the overall conversational process and significantly fewer messages spent on channel maintenance in the avatar groups. The avatars also significantly improved the users' perception of each others' effort. Finally, MapChat with avatars was found to be significantly more personal, enjoyable, and easier to use. The ramifications of these findings with respect to mediating conversation are discussed.by Hannes Högni. Vilhjálmsson.Ph.D

    Breaking Up Is Hard To Do: Media Switching and Media Ideologies

    Get PDF
    When U.S. college students tell breakup stories, they often indicate what medium was used for each exchange. In this article, I explore what this practice reveals about people’s media ideologies. By extending previous scholarship on language ideologies to media, I trace how switching media or refusing to switch media contributes to the labor of disconnecting the relationship, determining whether phrases such as “it’s over” are effective or not

    SmilieFace : an innovative affective messaging application to enhance social networking

    Get PDF

    Automatic Generation of Facial Expression Using Triangular Geometric Deformation

    Get PDF
    AbstractThis paper presents an image deformation algorithm and constructs an automatic facial expression generation system to generate new facial expressions in neutral state. After the users input the face image in a neutral state into the system, the system separates the possible facial areas and the image background by skin color segmentation. It then uses the morphological operation to remove noise and to capture the organs of facial expression, such as the eyes, mouth, eyebrow, and nose. The feature control points are labeled according to the feature points (FPs) defined by MPEG-4. After the designation of the deformation expression, the system also increases the image correction points based on the obtained FP coordinates. The FPs are utilized as image deformation units by triangular segmentation. The triangle is split into two vectors. The triangle points are regarded as linear combinations of two vectors, and the coefficients of the linear combinations correspond to the triangular vectors of the original image. Next, the corresponding coordinates are obtained to complete the image correction by image interpolation technology to generate the new expression. As for the proposed deformation algorithm, 10 additional correction points are generated in the positions corresponding to the FPs obtained according to MPEG-4. Obtaining the correction points within a very short operation time is easy. Using a particular triangulation for deformation can extend the material area without narrowing the unwanted material area, thus saving the filling material operation in some areas

    SID 04, Social Intelligence Design:Proceedings Third Workshop on Social Intelligence Design

    Get PDF

    Visual prosody in speech-driven facial animation: elicitation, prediction, and perceptual evaluation

    Get PDF
    Facial animations capable of articulating accurate movements in synchrony with a speech track have become a subject of much research during the past decade. Most of these efforts have focused on articulation of lip and tongue movements, since these are the primary sources of information in speech reading. However, a wealth of paralinguistic information is implicitly conveyed through visual prosody (e.g., head and eyebrow movements). In contrast with lip/tongue movements, however, for which the articulation rules are fairly well known (i.e., viseme-phoneme mappings, coarticulation), little is known about the generation of visual prosody. The objective of this thesis is to explore the perceptual contributions of visual prosody in speech-driven facial avatars. Our main hypothesis is that visual prosody driven by acoustics of the speech signal, as opposed to random or no visual prosody, results in more realistic, coherent and convincing facial animations. To test this hypothesis, we have developed an audio-visual system capable of capturing synchronized speech and facial motion from a speaker using infrared illumination and retro-reflective markers. In order to elicit natural visual prosody, a story-telling experiment was designed in which the actors were shown a short cartoon video, and subsequently asked to narrate the episode. From this audio-visual data, four different facial animations were generated, articulating no visual prosody, Perlin-noise, speech-driven movements, and ground truth movements. Speech-driven movements were driven by acoustic features of the speech signal (e.g., fundamental frequency and energy) using rule-based heuristics and autoregressive models. A pair-wise perceptual evaluation shows that subjects can clearly discriminate among the four visual prosody animations. It also shows that speech-driven movements and Perlin-noise, in that order, approach the performance of veridical motion. The results are quite promising and suggest that speech-driven motion could outperform Perlin-noise if more powerful motion prediction models are used. In addition, our results also show that exaggeration can bias the viewer to perceive a computer generated character to be more realistic motion-wise

    A Cloud-Based Extensible Avatar For Human Robot Interaction

    Get PDF
    Adding an interactive avatar to a human-robot interface requires the development of tools that animate the avatar so as to simulate an intelligent conversation partner. Here we describe a toolkit that supports interactive avatar modeling for human-computer interaction. The toolkit utilizes cloud-based speech-to-text software that provides active listening, a cloud-based AI to generate appropriate textual responses to user queries, and a cloud-based text-to-speech generation engine to generate utterances for this text. This output is combined with a cloud-based 3D avatar animation synchronized to the spoken response. Generated text responses are embedded within an XML structure that allows for tuning the nature of the avatar animation to simulate different emotional states. An expression package controls the avatar's facial expressions. The introduced rendering latency is obscured through parallel processing and an idle loop process that animates the avatar between utterances. The efficiency of the approach is validated through a formal user study

    Slow Design through Fast Technology: The Application of Socially Reflective Design Principles to Modern Mediated Technologies

    Get PDF
    abstract: This thesis describes research into the application of socially reflective, or "Slow", design principles to modern mediated systems, or "Fast" technology. The "information overload" caused by drastic changes in the nature of human communications in the last decade has become a serious problem, with many human-technology interactions creating mental confusion, personal discomfort and a sense of disconnection. Slow design principles aim to help create interactions that avoid these problems by increasing interaction richness, encouraging engagement with local communities, and promoting personal and communal reflection. Three major functional mediated systems were constructed to examine the application of Slow principles on multiple scales: KiteViz, Taskville and Your ____ Here. Each system was designed based on a survey of current research within the field and previous research results. KiteViz is a visually metaphorical display of Twitter activity within a small group, Taskville is a workplace game designed to support collaboration and group awareness in an enterprise, and Your ____ Here is a physical-digital projection system that augments built architecture with user-submitted content to promote discussion and reflection. Each system was tested with multiple users and user groups, the systems were evaluated for their effectiveness in supporting each of the tenets of Slow design, and the results were collected into a set of key findings. Each system was considered generally effective, with specific strengths varying. The thesis concludes with a framework of five major principles to be used in the design of modern, highly-mediated systems that still apply Slow design principles: design for fundamental understanding, handle complexity gracefully, Slow is a process of evolution and revelation, leverage groups and personal connections to encode value, and allow for participation across a widely distributed range of scales.Dissertation/ThesisM.S.D. Design 201
    corecore