Search CORE

1,165 research outputs found

Speech-driven Animation with Meaningful Behaviors

Author: Busso Carlos
Sadoughi Najmeh
Publication venue
Publication date: 04/08/2017
Field of study

Conversational agents (CAs) play an important role in human computer interaction. Creating believable movements for CAs is challenging, since the movements have to be meaningful and natural, reflecting the coupling between gestures and speech. Studies in the past have mainly relied on rule-based or data-driven approaches. Rule-based methods focus on creating meaningful behaviors conveying the underlying message, but the gestures cannot be easily synchronized with speech. Data-driven approaches, especially speech-driven models, can capture the relationship between speech and gestures. However, they create behaviors disregarding the meaning of the message. This study proposes to bridge the gap between these two approaches overcoming their limitations. The approach builds a dynamic Bayesian network (DBN), where a discrete variable is added to constrain the behaviors on the underlying constraint. The study implements and evaluates the approach with two constraints: discourse functions and prototypical behaviors. By constraining on the discourse functions (e.g., questions), the model learns the characteristic behaviors associated with a given discourse class learning the rules from the data. By constraining on prototypical behaviors (e.g., head nods), the approach can be embedded in a rule-based system as a behavior realizer creating trajectories that are timely synchronized with speech. The study proposes a DBN structure and a training approach that (1) models the cause-effect relationship between the constraint and the gestures, (2) initializes the state configuration models increasing the range of the generated behaviors, and (3) captures the differences in the behaviors across constraints by enforcing sparse transitions between shared and exclusive states per constraint. Objective and subjective evaluations demonstrate the benefits of the proposed approach over an unconstrained model.Comment: 13 pages, 12 figures, 5 table

arXiv.org e-Print Archive

A Comprehensive Review of Data-Driven Co-Speech Gesture Generation

Author: Ahuja Chaitanya
Henter Gustav Eje
Kucherenko Taras
Neff Michael
Nyatsanga Simbarashe
Publication venue: 'Wiley'
Publication date: 10/04/2023
Field of study

Gestures that accompany speech are an essential part of natural and efficient embodied human communication. The automatic generation of such co-speech gestures is a long-standing problem in computer animation and is considered an enabling technology in film, games, virtual social spaces, and for interaction with social robots. The problem is made challenging by the idiosyncratic and non-periodic nature of human co-speech gesture motion, and by the great diversity of communicative functions that gestures encompass. Gesture generation has seen surging interest recently, owing to the emergence of more and larger datasets of human gesture motion, combined with strides in deep-learning-based generative models, that benefit from the growing availability of data. This review article summarizes co-speech gesture generation research, with a particular focus on deep generative models. First, we articulate the theory describing human gesticulation and how it complements speech. Next, we briefly discuss rule-based and classical statistical gesture synthesis, before delving into deep learning approaches. We employ the choice of input modalities as an organizing principle, examining systems that generate gestures from audio, text, and non-linguistic input. We also chronicle the evolution of the related training data sets in terms of size, diversity, motion quality, and collection method. Finally, we identify key research challenges in gesture generation, including data availability and quality; producing human-like motion; grounding the gesture in the co-occurring speech in interaction with other speakers, and in the environment; performing gesture evaluation; and integration of gesture synthesis into applications. We highlight recent approaches to tackling the various key challenges, as well as the limitations of these approaches, and point toward areas of future development.Comment: Accepted for EUROGRAPHICS 202

arXiv.org e-Print Archive

Recommended from our members

The Multimodal and Sequential Design of Co-Animation as a Practice for Association in English Interaction

Author: Cantarutti Marina
Publication venue
Publication date: 20/04/2020
Field of study

This thesis describes the understudied interactional practice of co-animation: during the development of an activity in conversation, a speaker incorporates an animation -i.e. a quote, or (re)enactment - and a co-participant responds, pre-emptively, or in the contiguous turn, with a completion or continuation of the animation of the same figure. Based on the study of 89 co-animation sequences found in 10 hours of video-recordings of naturalistic English interaction between friends, relatives or co-workers, this thesis adopts the theoretical and methodological tenets of Conversation Analysis and Interactional Linguistics to describe the multimodal, sequential, and relational organisation of this practice. This thesis analyses how participants mark the shift from the here-and-now into the animation space, and how co-participants make their contributions both hearable as coherent with prior animations, and as fitted affiliative responses that further the ongoing course of action. Lexico-grammatical, phonetic, and gestural-postural resources are analysed for their interactional import in their concurrent framing of animation and the display of stance and conditional relevance. The organisation of resources in responsive co-animations is found to be positionally-sensitive, with co-participants negotiating agency and epistemic access and entitlement differently relative to the onset of co-animation and to the stage in the ongoing activity. The scrutiny of the situated deployment of co-animation in the social activities of troubles-tellings/complaint stories on the one hand, and teasing/joint fictionalisation on the other, reveals how co-animation contributes to the process of association, that is, the building of single momentary units of participation (collectivities). Co-participants are found to team up around what is presented as a shared stance, values, and identity, against absent but invoked behaviours or individuals engaging in moral transgressions, by jointly “doing being” the same voice

Open Research Online (The Open University)

White Rose E-theses Online

Building Embodied Conversational Agents:Observations on human nonverbal behaviour as a resource for the development of artificial characters

Author: Blomsma Pieter A.
Publication venue: PrintPartners Ipskamp B.V. (SIKS Dissertation Series, 16)
Publication date: 20/06/2023
Field of study

"Wow this is so cool!" This is what I most probably yelled, back in the 90s, when my first computer program on our MSX computer turned out to do exactly what I wanted it to do. The program contained the following instruction: COLOR 10(1.1) After hitting enter, it would change the screen color from light blue to dark yellow. A few years after that experience, Microsoft Windows was introduced. Windows came with an intuitive graphical user interface that was designed to allow all people, so also those who would not consider themselves to be experienced computer addicts, to interact with the computer. This was a major step forward in human-computer interaction, as from that point forward no complex programming skills were required anymore to perform such actions as adapting the screen color. Changing the background was just a matter of pointing the mouse to the desired color on a color palette. "Wow this is so cool!". This is what I shouted, again, 20 years later. This time my new smartphone successfully skipped to the next song on Spotify because I literally told my smartphone, with my voice, to do so. Being able to operate your smartphone with natural language through voice-control can be extremely handy, for instance when listening to music while showering. Again, the option to handle a computer with voice instructions turned out to be a significant optimization in human-computer interaction. From now on, computers could be instructed without the use of a screen, mouse or keyboard, and instead could operate successfully simply by telling the machine what to do. In other words, I have personally witnessed how, within only a few decades, the way people interact with computers has changed drastically, starting as a rather technical and abstract enterprise to becoming something that was both natural and intuitive, and did not require any advanced computer background. Accordingly, while computers used to be machines that could only be operated by technically-oriented individuals, they had gradually changed into devices that are part of many people’s household, just as much as a television, a vacuum cleaner or a microwave oven. The introduction of voice control is a significant feature of the newer generation of interfaces in the sense that these have become more "antropomorphic" and try to mimic the way people interact in daily life, where indeed the voice is a universally used device that humans exploit in their exchanges with others. The question then arises whether it would be possible to go even one step further, where people, like in science-fiction movies, interact with avatars or humanoid robots, whereby users can have a proper conversation with a computer-simulated human that is indistinguishable from a real human. An interaction with a human-like representation of a computer that behaves, talks and reacts like a real person would imply that the computer is able to not only produce and understand messages transmitted auditorily through the voice, but also could rely on the perception and generation of different forms of body language, such as facial expressions, gestures or body posture. At the time of writing, developments of this next step in human-computer interaction are in full swing, but the type of such interactions is still rather constrained when compared to the way humans have their exchanges with other humans. It is interesting to reflect on how such future humanmachine interactions may look like. When we consider other products that have been created in history, it sometimes is striking to see that some of these have been inspired by things that can be observed in our environment, yet at the same do not have to be exact copies of those phenomena. For instance, an airplane has wings just as birds, yet the wings of an airplane do not make those typical movements a bird would produce to fly. Moreover, an airplane has wheels, whereas a bird has legs. At the same time, an airplane has made it possible for a humans to cover long distances in a fast and smooth manner in a way that was unthinkable before it was invented. The example of the airplane shows how new technologies can have "unnatural" properties, but can nonetheless be very beneficial and impactful for human beings. This dissertation centers on this practical question of how virtual humans can be programmed to act more human-like. The four studies presented in this dissertation all have the equivalent underlying question of how parts of human behavior can be captured, such that computers can use it to become more human-like. Each study differs in method, perspective and specific questions, but they are all aimed to gain insights and directions that would help further push the computer developments of human-like behavior and investigate (the simulation of) human conversational behavior. The rest of this introductory chapter gives a general overview of virtual humans (also known as embodied conversational agents), their potential uses and the engineering challenges, followed by an overview of the four studies

Tilburg University Repository

Expressive characters and a text chat interface

Author: Ballin D
Crabtree IB
Gillies M
Publication venue
Publication date: 01/01/2004
Field of study

UCL Discovery

Building Embodied Conversational Agents:Observations on human nonverbal behaviour as a resource for the development of artificial characters

Author: Blomsma Pieter A.
Publication venue: PrintPartners Ipskamp B.V. (SIKS Dissertation Series, 16)
Publication date: 20/06/2023
Field of study

Tilburg University Repository

Joint Proceedings of the Intelligent Virtual Agents 2012 Workshops:Santa Cruz, CA, September 15, 2012

Author: Böck Ronald
Edlund Jens
Traum David
Publication venue: Otto von Guericke University Magdeburg
Publication date: 01/09/2012
Field of study

University of Twente Research Information

Eyebrow raising in dialogue: discourse structure, utterance function, and pitch accents

Author: Flecha-García María Luisa
Publication venue: The University of Edinburgh
Publication date: 01/01/2006
Field of study

Some studies have suggested a relationship between eyebrow raising and different aspects of the verbal message, but our knowledge about this link is still very limited. If we could establish and characterise a relation between eyebrow raises and the linguistic signal we could better understand human multimodal communication behaviour. We could also improve the credibility and efficiency of computer animated conversational agents in multimodal communication systems.This thesis investigated eyebrow raising in a corpus of task-oriented English dialogues. Applying a standard dialogue coding scheme (Conversational Game Analysis, Carletta et al., 1997), eyebrow raises were studied in connection with discourse structure and utterance function. Supporting the prediction, more frequent and longer eyebrow raising occurred in the initial utterance of highlevel discourse segments than anywhere else in the dialogue (where 'high-level discourse segment' = transaction, and 'utterance' = move, following Carletta et al.). Additionally, eyebrow raises were more frequent in instructions than in requests for or acknowledgements of information. Instructions also had longer eyebrow raising than any other type of utterance. Contrary to the prediction, the start of a lower-level discourse segment (conversational game) did not have more eyebrow raising than any other position in the dialogue, and queries did not have more eyebrow raising than any other type of utterance.Eyebrow raises were also studied in relation to intonational events, namely pitch accents. Results showed evidence of alignment between the brow raise start and the start of a pitch accent. Most pitch accents were not associated with brow raising, but when brow raises occurred they tended to immediately precede a pitch accent on the speech signal. To investigate what could explain the alignment between the two events, pitch accents aligned with eyebrow raises were compared to all other pitch accents in terms of: phonological characteristics (primary vs. secondary pitch accents, and downstep-initial vs. non-initial pitch accents), information structure (given vs. new information in referring expressions, and the last quarter vs. earlier parts of the utterance length) and type of utterance in which they occurred (instruction vs. non-instruction). Those comparisons suggested that brow raises may be aligned more frequently with pitch accents in downstepinitial position and in instructions. No differences were found in terms of information structure or between primary/secondary accents.The results provide evidence of a link between eyebrow raising and spoken language. Eyebrow raises may signal the start of linguistic units such as discourse segments and some prosodic phenomena, they may be related to utterance function, and they are aligned with pitch accents. Possible linguistic functions are proposed, such as structuring and emphasising information in the verbal message

Edinburgh Research Archive

Real-time generation and adaptation of social companion robot behaviors

Author: Ritschel Hannes
Publication venue
Publication date: 23/01/2023
Field of study

Social robots will be part of our future homes. They will assist us in everyday tasks, entertain us, and provide helpful advice. However, the technology still faces challenges that must be overcome to equip the machine with social competencies and make it a socially intelligent and accepted housemate. An essential skill of every social robot is verbal and non-verbal communication. In contrast to voice assistants, smartphones, and smart home technology, which are already part of many people's lives today, social robots have an embodiment that raises expectations towards the machine. Their anthropomorphic or zoomorphic appearance suggests they can communicate naturally with speech, gestures, or facial expressions and understand corresponding human behaviors. In addition, robots also need to consider individual users' preferences: everybody is shaped by their culture, social norms, and life experiences, resulting in different expectations towards communication with a robot. However, robots do not have human intuition - they must be equipped with the corresponding algorithmic solutions to these problems. This thesis investigates the use of reinforcement learning to adapt the robot's verbal and non-verbal communication to the user's needs and preferences. Such non-functional adaptation of the robot's behaviors primarily aims to improve the user experience and the robot's perceived social intelligence. The literature has not yet provided a holistic view of the overall challenge: real-time adaptation requires control over the robot's multimodal behavior generation, an understanding of human feedback, and an algorithmic basis for machine learning. Thus, this thesis develops a conceptual framework for designing real-time non-functional social robot behavior adaptation with reinforcement learning. It provides a higher-level view from the system designer's perspective and guidance from the start to the end. It illustrates the process of modeling, simulating, and evaluating such adaptation processes. Specifically, it guides the integration of human feedback and social signals to equip the machine with social awareness. The conceptual framework is put into practice for several use cases, resulting in technical proofs of concept and research prototypes. They are evaluated in the lab and in in-situ studies. These approaches address typical activities in domestic environments, focussing on the robot's expression of personality, persona, politeness, and humor. Within this scope, the robot adapts its spoken utterances, prosody, and animations based on human explicit or implicit feedback.Soziale Roboter werden Teil unseres zukünftigen Zuhauses sein. Sie werden uns bei alltäglichen Aufgaben unterstützen, uns unterhalten und uns mit hilfreichen Ratschlägen versorgen. Noch gibt es allerdings technische Herausforderungen, die zunächst überwunden werden müssen, um die Maschine mit sozialen Kompetenzen auszustatten und zu einem sozial intelligenten und akzeptierten Mitbewohner zu machen. Eine wesentliche Fähigkeit eines jeden sozialen Roboters ist die verbale und nonverbale Kommunikation. Im Gegensatz zu Sprachassistenten, Smartphones und Smart-Home-Technologien, die bereits heute Teil des Lebens vieler Menschen sind, haben soziale Roboter eine Verkörperung, die Erwartungen an die Maschine weckt. Ihr anthropomorphes oder zoomorphes Aussehen legt nahe, dass sie in der Lage sind, auf natürliche Weise mit Sprache, Gestik oder Mimik zu kommunizieren, aber auch entsprechende menschliche Kommunikation zu verstehen. Darüber hinaus müssen Roboter auch die individuellen Vorlieben der Benutzer berücksichtigen. So ist jeder Mensch von seiner Kultur, sozialen Normen und eigenen Lebenserfahrungen geprägt, was zu unterschiedlichen Erwartungen an die Kommunikation mit einem Roboter führt. Roboter haben jedoch keine menschliche Intuition - sie müssen mit entsprechenden Algorithmen für diese Probleme ausgestattet werden. In dieser Arbeit wird der Einsatz von bestärkendem Lernen untersucht, um die verbale und nonverbale Kommunikation des Roboters an die Bedürfnisse und Vorlieben des Benutzers anzupassen. Eine solche nicht-funktionale Anpassung des Roboterverhaltens zielt in erster Linie darauf ab, das Benutzererlebnis und die wahrgenommene soziale Intelligenz des Roboters zu verbessern. Die Literatur bietet bisher keine ganzheitliche Sicht auf diese Herausforderung: Echtzeitanpassung erfordert die Kontrolle über die multimodale Verhaltenserzeugung des Roboters, ein Verständnis des menschlichen Feedbacks und eine algorithmische Basis für maschinelles Lernen. Daher wird in dieser Arbeit ein konzeptioneller Rahmen für die Gestaltung von nicht-funktionaler Anpassung der Kommunikation sozialer Roboter mit bestärkendem Lernen entwickelt. Er bietet eine übergeordnete Sichtweise aus der Perspektive des Systemdesigners und eine Anleitung vom Anfang bis zum Ende. Er veranschaulicht den Prozess der Modellierung, Simulation und Evaluierung solcher Anpassungsprozesse. Insbesondere wird auf die Integration von menschlichem Feedback und sozialen Signalen eingegangen, um die Maschine mit sozialem Bewusstsein auszustatten. Der konzeptionelle Rahmen wird für mehrere Anwendungsfälle in die Praxis umgesetzt, was zu technischen Konzeptnachweisen und Forschungsprototypen führt, die in Labor- und In-situ-Studien evaluiert werden. Diese Ansätze befassen sich mit typischen Aktivitäten in häuslichen Umgebungen, wobei der Schwerpunkt auf dem Ausdruck der Persönlichkeit, dem Persona, der Höflichkeit und dem Humor des Roboters liegt. In diesem Rahmen passt der Roboter seine Sprache, Prosodie, und Animationen auf Basis expliziten oder impliziten menschlichen Feedbacks an

OPUS Augsburg