126 research outputs found
LivelySpeaker: Towards Semantic-Aware Co-Speech Gesture Generation
Gestures are non-verbal but important behaviors accompanying people's speech.
While previous methods are able to generate speech rhythm-synchronized
gestures, the semantic context of the speech is generally lacking in the
gesticulations. Although semantic gestures do not occur very regularly in human
speech, they are indeed the key for the audience to understand the speech
context in a more immersive environment. Hence, we introduce LivelySpeaker, a
framework that realizes semantics-aware co-speech gesture generation and offers
several control handles. In particular, our method decouples the task into two
stages: script-based gesture generation and audio-guided rhythm refinement.
Specifically, the script-based gesture generation leverages the pre-trained
CLIP text embeddings as the guidance for generating gestures that are highly
semantically aligned with the script. Then, we devise a simple but effective
diffusion-based gesture generation backbone simply using pure MLPs, that is
conditioned on only audio signals and learns to gesticulate with realistic
motions. We utilize such powerful prior to rhyme the script-guided gestures
with the audio signals, notably in a zero-shot setting. Our novel two-stage
generation framework also enables several applications, such as changing the
gesticulation style, editing the co-speech gestures via textual prompting, and
controlling the semantic awareness and rhythm alignment with guided diffusion.
Extensive experiments demonstrate the advantages of the proposed framework over
competing methods. In addition, our core diffusion-based generative model also
achieves state-of-the-art performance on two benchmarks. The code and model
will be released to facilitate future research.Comment: Accepted by ICCV 202
Encoding Theory of Mind in Character Design for Pedagogical Interactive Narrative
Computer aided interactive narrative allows people to participate actively in a dynamically unfolding story, by playing a character or by exerting directorial control. Because of its potential for providing interesting stories as well as allowing user interaction, interactive narrative has been recognized as a promising tool for providing both education and entertainment. This paper discusses the challenges in creating interactive narratives for pedagogical applications and how the challenges can be addressed by using agent-based technologies. We argue that a rich model of characters and in particular a Theory of Mind capacity are needed. The character architect in the Thespian framework for interactive narrative is presented as an example of how decision-theoretic agents can be used for encoding Theory of Mind and for creating pedagogical interactive narratives
Learning Data-Driven Models of Non-Verbal Behaviors for Building Rapport Using an Intelligent Virtual Agent
There is a growing societal need to address the increasing prevalence of behavioral health issues, such as obesity, alcohol or drug use, and general lack of treatment adherence for a variety of health problems. The statistics, worldwide and in the USA, are daunting. Excessive alcohol use is the third leading preventable cause of death in the United States (with 79,000 deaths annually), and is responsible for a wide range of health and social problems. On the positive side though, these behavioral health issues (and associated possible diseases) can often be prevented with relatively simple lifestyle changes, such as losing weight with a diet and/or physical exercise, or learning how to reduce alcohol consumption. Medicine has therefore started to move toward finding ways of preventively promoting wellness, rather than solely treating already established illness.
Evidence-based patient-centered Brief Motivational Interviewing (BMI) interven- tions have been found particularly effective in helping people find intrinsic motivation to change problem behaviors after short counseling sessions, and to maintain healthy lifestyles over the long-term. Lack of locally available personnel well-trained in BMI, however, often limits access to successful interventions for people in need. To fill this accessibility gap, Computer-Based Interventions (CBIs) have started to emerge.
Success of the CBIs, however, critically relies on insuring engagement and retention of CBI users so that they remain motivated to use these systems and come back to use them over the long term as necessary.
Because of their text-only interfaces, current CBIs can therefore only express limited empathy and rapport, which are the most important factors of health interventions. Fortunately, in the last decade, computer science research has progressed in the design of simulated human characters with anthropomorphic communicative abilities. Virtual characters interact using humansâ innate communication modalities, such as facial expressions, body language, speech, and natural language understanding. By advancing research in Artificial Intelligence (AI), we can improve the ability of artificial agents to help us solve CBI problems.
To facilitate successful communication and social interaction between artificial agents and human partners, it is essential that aspects of human social behavior, especially empathy and rapport, be considered when designing human-computer interfaces. Hence, the goal of the present dissertation is to provide a computational model of rapport to enhance an artificial agentâs social behavior, and to provide an experimental tool for the psychological theories shaping the model. Parts of this thesis were already published in [LYL+12, AYL12, AL13, ALYR13, LAYR13, YALR13, ALY14]
Crowd modeling and simulation technologies
As a collective and highly dynamic social group, the human crowd is a fascinating phenomenon that has been frequently studied by experts from various areas. Recently, computer-based modeling and simulation technologies have emerged to support investigation of the dynamics of crowds, such as a crowd's behaviors under normal and emergent situations. This article assesses the major existing technologies for crowd modeling and simulation. We first propose a two-dimensional categorization mechanism to classify existing work depending on the
size
of crowds and the
time-scale
of the crowd phenomena of interest. Four evaluation criteria have also been introduced to evaluate existing crowd simulation systems from the point of view of both a modeler and an end-user.
We have discussed some influential existing work in crowd modeling and simulation regarding their major features, performance as well as the technologies used in this work. We have also discussed some open problems in the area. This article will provide the researchers with useful information and insights on the state of the art of the technologies in crowd modeling and simulation as well as future research directions.</jats:p
An Actor-Centric Approach to Facial Animation Control by Neural Networks For Non-Player Characters in Video Games
Game developers increasingly consider the degree to which character animation emulates facial expressions found in cinema. Employing animators and actors to produce cinematic facial animation by mixing motion capture and hand-crafted animation is labor intensive and therefore expensive. Emotion corpora and neural network controllers have shown promise toward developing autonomous animation that does not rely on motion capture. Previous research and practice in disciplines of Computer Science, Psychology and the Performing Arts have provided frameworks on which to build a workflow toward creating an emotion AI system that can animate the facial mesh of a 3d non-player character deploying a combination of related theories and methods. However, past investigations and their resulting production methods largely ignore the emotion generation systems that have evolved in the performing arts for more than a century. We find very little research that embraces the intellectual process of trained actors as complex collaborators from which to understand and model the training of a neural network for character animation. This investigation demonstrates a workflow design that integrates knowledge from the performing arts and the affective branches of the social and biological sciences. Our workflow begins at the stage of developing and annotating a fictional scenario with actors, to producing a video emotion corpus, to designing training and validating a neural network, to analyzing the emotion data annotation of the corpus and neural network, and finally to determining resemblant behavior of its autonomous animation control of a 3d character facial mesh. The resulting workflow includes a method for the development of a neural network architecture whose initial efficacy as a facial emotion expression simulator has been tested and validated as substantially resemblant to the character behavior developed by a human actor
Recommended from our members
Proceedings of the 8th international conference on disability, virtual reality and associated technologies (ICDVRAT 2010)
The proceedings of the conferenc
Social talk capabilities for dialogue systems
Small talk capabilities are an important but very challenging extension to dialogue systems. Small talk (or âsocial talkâ) refers to a kind of conversation, which does not focus on the exchange of information, but on the negotiation of social roles and situations. The goal of this thesis is to provide knowledge, processes and structures that can be used by dialogue systems to satisfactorily participate in social conversations. For this purpose the thesis presents research in the areas of natural-language understanding, dialogue management and error handling. Nine new models of social talk based on a data analysis of small talk conversations are described. The functionally-motivated and content-abstract models can be used for small talk conversations onâvarious topics. The basic elements of the models consist of dialogue acts for social talk newly developed on basis of social science theory. The thesis also presents some conversation strategies for the treatmentâof so-called âout-of-domainâ (OoD) utterances that can be used to avoid errors in the input understanding of dialogue systems. Additionally, the thesis describes a new extension to dialogue management that flexibly manages interwoven dialogue threads. The small talk models as well as the strategies for handling OoD utterances are encoded as computational dialogue threads
Social talk capabilities for dialogue systems
Small talk capabilities are an important but very challenging extension to dialogue systems. Small talk (or social talk) refers to a kind of conversation, which does not focus on the exchange of information, but on the negotiation of social roles and situations. The goal of this thesis is to provide knowledge, processes and structures that can be used by dialogue systems to satisfactorily participate in social conversations. For this purpose the thesis presents research in the areas of natural-language understanding, dialogue management and error handling. Nine new models of social talk based on a data analysis of small talk conversations are described. The functionally-motivated and content-abstract models can be used for small talk conversations on various topics. The basic elements of the models consist of dialogue acts for social talk newly developed on basis of social science theory. The thesis also presents some conversation strategies for the treatment of so-called out-of-domain (OoD) utterances that can be used to avoid errors in the input understanding of dialogue systems. Additionally, the thesis describes a new extension to dialogue management that flexibly manages interwoven dialogue threads. The small talk models as well as the strategies for handling OoD utterances are encoded as computational dialogue threads
- âŠ