189 research outputs found

    How Avatar Customization Affects Fear in a Game-based Digital Exposure Task for Social Anxiety

    Get PDF
    The treatment of social anxiety through digital exposure therapy is challenging due to the cognitive properties of social anxiety-individuals need to be fully engaged in the task and feel themselves represented in the social situation; however, avatar customization has been shown to increase both engagement and social presence. In this paper, we harness techniques used in commercial games, and investigate how customizing self-representation in a novel digital exposure task for social anxiety influences the experience of social threat. In an online experiment with 200 participants, participants either customized their avatar or were assigned a predefined avatar. Participants then controlled the avatar through a virtual shop, where they had to solve a math problem, while a simulated audience within the virtual world observed them and negatively judged their performance. Our findings show that we can stimulate the fear of evaluation by others in our task, that fear is driven primarily by trait social anxiety, and that this relationship is strengthened for people higher in trait social anxiety. We provide new insights into the effects of customization in a novel therapeutic context, and embed the discussion of avatar customization into related work in social anxiety and human-computer interaction

    First impressions: A survey on vision-based apparent personality trait analysis

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Personality analysis has been widely studied in psychology, neuropsychology, and signal processing fields, among others. From the past few years, it also became an attractive research area in visual computing. From the computational point of view, by far speech and text have been the most considered cues of information for analyzing personality. However, recently there has been an increasing interest from the computer vision community in analyzing personality from visual data. Recent computer vision approaches are able to accurately analyze human faces, body postures and behaviors, and use these information to infer apparent personality traits. Because of the overwhelming research interest in this topic, and of the potential impact that this sort of methods could have in society, we present in this paper an up-to-date review of existing vision-based approaches for apparent personality trait recognition. We describe seminal and cutting edge works on the subject, discussing and comparing their distinctive features and limitations. Future venues of research in the field are identified and discussed. Furthermore, aspects on the subjectivity in data labeling/evaluation, as well as current datasets and challenges organized to push the research on the field are reviewed.Peer ReviewedPostprint (author's final draft

    VELUM: A 3D Puzzle/Exploration Game Designed Using Crowdsourced AI Facial Analysis

    Get PDF
    Velum is a first-person 3D puzzle/exploration game set in a timeless version of the Boston Public Garden. The project’s narrative framework and aesthetics are based on one of the Garden’s most prominent features, the Ether Monument, which commemorates the 1846 discovery of diethyl ether’s effectiveness as a medical anesthetic. A sequence of nine abstract challenges is rewarded by a progressive revelation of the player’s mysterious identity and purpose. The puzzle design was informed by the use of crowdsourced playtesting involving 300+ volunteers, combining standard data telemetry with AI-based facial image analysis capable of mapping player emotions to gameplay events

    A Comprehensive Review of Data-Driven Co-Speech Gesture Generation

    Full text link
    Gestures that accompany speech are an essential part of natural and efficient embodied human communication. The automatic generation of such co-speech gestures is a long-standing problem in computer animation and is considered an enabling technology in film, games, virtual social spaces, and for interaction with social robots. The problem is made challenging by the idiosyncratic and non-periodic nature of human co-speech gesture motion, and by the great diversity of communicative functions that gestures encompass. Gesture generation has seen surging interest recently, owing to the emergence of more and larger datasets of human gesture motion, combined with strides in deep-learning-based generative models, that benefit from the growing availability of data. This review article summarizes co-speech gesture generation research, with a particular focus on deep generative models. First, we articulate the theory describing human gesticulation and how it complements speech. Next, we briefly discuss rule-based and classical statistical gesture synthesis, before delving into deep learning approaches. We employ the choice of input modalities as an organizing principle, examining systems that generate gestures from audio, text, and non-linguistic input. We also chronicle the evolution of the related training data sets in terms of size, diversity, motion quality, and collection method. Finally, we identify key research challenges in gesture generation, including data availability and quality; producing human-like motion; grounding the gesture in the co-occurring speech in interaction with other speakers, and in the environment; performing gesture evaluation; and integration of gesture synthesis into applications. We highlight recent approaches to tackling the various key challenges, as well as the limitations of these approaches, and point toward areas of future development.Comment: Accepted for EUROGRAPHICS 202

    A Motion Matching-Based Framework for Controllable Gesture Synthesis from Speech

    Get PDF

    Learning-based 3D human motion capture and animation synthesis

    Get PDF
    Realistic virtual human avatar is a crucial element in a wide range of applications, from 3D animated movies to emerging AR/VR technologies. However, producing a believable 3D motion for such avatars is widely known to be a challenging task. A traditional 3D human motion generation pipeline consists of several stages, each requiring expensive equipment and skilled human labor to perform, limiting its usage beyond the entertainment industry despite its massive potential benefits. This thesis attempts to explore some alternative solutions to reduce the complexity of the traditional 3D animation pipeline. To this end, it presents several novel ways to perform 3D human motion capture, synthesis, and control. Specifically, it focuses on using learning-based methods to bypass the critical bottlenecks of the classical animation approach. First, a new 3D pose estimation method from in-the-wild monocular images is proposed, eliminating the need for a multi-camera setup in the traditional motion capture system. Second, it explores several data-driven designs to achieve a believable 3D human motion synthesis and control that can potentially reduce the need for manual animation. In particular, the problem of speech-driven 3D gesture synthesis is chosen as the case study due to its uniquely ambiguous nature. The improved motion generation quality is achieved by introducing a novel adversarial objective that rates the difference between real and synthetic data. A novel motion generation strategy is also introduced by combining a classical database search algorithm with a powerful deep learning method, resulting in a greater motion control variation than the purely predictive counterparts. Furthermore, this thesis also contributes a new way of collecting a large-scale 3D motion dataset through the use of learning-based monocular estimations methods. This result demonstrates the promising capability of learning-based monocular approaches and shows the prospect of combining these learning-based modules into an integrated 3D animation framework. The presented learning-based solutions open the possibility of democratizing the traditional 3D animation system that can be enabled using low-cost equipment, e.g., a single RGB camera. Finally, this thesis also discusses the potential further integration of these learning-based approaches to enhance 3D animation technology.Realistische virtuelle menschliche Avatare sind ein entscheidendes Element in einer Vielzahl von Anwendungen, von 3D-Animationsfilmen bis hin zu neuen AR/VR-Technologien. Die Erzeugung glaubwürdiger Bewegungen solcher Avatare in drei Dimensionen ist bekanntermaßen eine herausfordernde Aufgabe. Traditionelle Pipelines zur Erzeugung menschlicher 3D-Bewegungen bestehen aus mehreren Stufen, die jede für sich genommen teure Ausrüstung und den Einsatz von Expertenwissen erfordern und daher trotz ihrer enormen potenziellen Vorteile abseits der Unterhaltungsindustrie nur eingeschränkt verwendbar sind. Diese Arbeit untersucht verschiedene Alternativen um die Komplexität der traditionellen 3D-Animations-Pipeline zu reduzieren. Zu diesem Zweck stellt sie mehrere neuartige Möglichkeiten zur Erfassung, Synthese und Steuerung humanoider 3D-Bewegungen vor. Sie konzentriert sich auf die Verwendung lernbasierter Methoden, um kritische Teile des klassischen Animationsansatzes zu überbrücken: Zunächst wird eine neue 3D-Pose-Estimation-Methode für monokulare Bilder vorgeschlagen, um die Notwendigkeit mehrerer Kameras im traditionellen Motion-Capture-Ansatz zu beseitigen. Des Weiteren untersucht die Arbeit mehrere datengetriebene Ansätze zur Synthese und Steuerung glaubwürdiger humanoider 3D-Bewegungen, die möglicherweise den Bedarf an manueller Animation reduzieren können. Als Fallstudie wird, aufgrund seiner einzigartig mehrdeutigen Natur, das Problem der sprachgetriebenen 3D-Gesten-Synthese untersucht. Die Verbesserungen in der Qualität der erzeugten Bewegungen wird durch eine neuartige Kostenfunktion erreicht, die den Unterschied zwischen realen und synthetischen Daten bewertet. Außerdem wird eine neue Strategie zur Bewegungssynthese beschrieben, die eine klassische Datenbanksuche mit einer leistungsstarken Deep-Learning-Methode kombiniert, was zu einer größeren Variation der Bewegungssteuerung führt, als rein lernbasierte Verfahren sie bieten. Ein weiterer Beitrag dieser Dissertation besteht in einer neuen Methode zum Aufbau eines großen Datensatzes dreidimensionaler Bewegungen, auf Grundlage lernbasierter monokularer Pose-Estimation- Methoden. Dies demonstriert die vielversprechenden Möglichkeiten lernbasierter monokularer Methoden und lässt die Aussicht erkennen, diese lernbasierten Module zu einem integrierten 3D-Animations- Framework zu kombinieren. Die in dieser Arbeit vorgestellten lernbasierten Lösungen eröffnen die Möglichkeit, das traditionelle 3D-Animationssystem auch mit kostengünstiger Ausrüstung, wie z.B. einer einzelnen RGB-Kamera verwendbar zu machen. Abschließend diskutiert diese Arbeit auch die mögliche weitere Integration dieser lernbasierten Ansätze zur Verbesserung der 3D-Animationstechnologie

    Towards a crowdsourced solution for the authoring bottleneck in interactive narratives

    Get PDF
    Interactive Storytelling research has produced a wealth of technologies that can be employed to create personalised narrative experiences, in which the audience takes a participating rather than observing role. But so far this technology has not led to the production of large scale playable interactive story experiences that realise the ambitions of the field. One main reason for this state of affairs is the difficulty of authoring interactive stories, a task that requires describing a huge amount of story building blocks in a machine friendly fashion. This is not only technically and conceptually more challenging than traditional narrative authoring but also a scalability problem. This thesis examines the authoring bottleneck through a case study and a literature survey and advocates a solution based on crowdsourcing. Prior work has already shown that combining a large number of example stories collected from crowd workers with a system that merges these contributions into a single interactive story can be an effective way to reduce the authorial burden. As a refinement of such an approach, this thesis introduces the novel concept of Crowd Task Adaptation. It argues that in order to maximise the usefulness of the collected stories, a system should dynamically and intelligently analyse the corpus of collected stories and based on this analysis modify the tasks handed out to crowd workers. Two authoring systems, ENIGMA and CROSCAT, which show two radically different approaches of using the Crowd Task Adaptation paradigm have been implemented and are described in this thesis. While ENIGMA adapts tasks through a realtime dialog between crowd workers and the system that is based on what has been learned from previously collected stories, CROSCAT modifies the backstory given to crowd workers in order to optimise the distribution of branching points in the tree structure that combines all collected stories. Two experimental studies of crowdsourced authoring are also presented. They lead to guidelines on how to employ crowdsourced authoring effectively, but more importantly the results of one of the studies demonstrate the effectiveness of the Crowd Task Adaptation approach

    Building Embodied Conversational Agents:Observations on human nonverbal behaviour as a resource for the development of artificial characters

    Get PDF
    "Wow this is so cool!" This is what I most probably yelled, back in the 90s, when my first computer program on our MSX computer turned out to do exactly what I wanted it to do. The program contained the following instruction: COLOR 10(1.1) After hitting enter, it would change the screen color from light blue to dark yellow. A few years after that experience, Microsoft Windows was introduced. Windows came with an intuitive graphical user interface that was designed to allow all people, so also those who would not consider themselves to be experienced computer addicts, to interact with the computer. This was a major step forward in human-computer interaction, as from that point forward no complex programming skills were required anymore to perform such actions as adapting the screen color. Changing the background was just a matter of pointing the mouse to the desired color on a color palette. "Wow this is so cool!". This is what I shouted, again, 20 years later. This time my new smartphone successfully skipped to the next song on Spotify because I literally told my smartphone, with my voice, to do so. Being able to operate your smartphone with natural language through voice-control can be extremely handy, for instance when listening to music while showering. Again, the option to handle a computer with voice instructions turned out to be a significant optimization in human-computer interaction. From now on, computers could be instructed without the use of a screen, mouse or keyboard, and instead could operate successfully simply by telling the machine what to do. In other words, I have personally witnessed how, within only a few decades, the way people interact with computers has changed drastically, starting as a rather technical and abstract enterprise to becoming something that was both natural and intuitive, and did not require any advanced computer background. Accordingly, while computers used to be machines that could only be operated by technically-oriented individuals, they had gradually changed into devices that are part of many people’s household, just as much as a television, a vacuum cleaner or a microwave oven. The introduction of voice control is a significant feature of the newer generation of interfaces in the sense that these have become more "antropomorphic" and try to mimic the way people interact in daily life, where indeed the voice is a universally used device that humans exploit in their exchanges with others. The question then arises whether it would be possible to go even one step further, where people, like in science-fiction movies, interact with avatars or humanoid robots, whereby users can have a proper conversation with a computer-simulated human that is indistinguishable from a real human. An interaction with a human-like representation of a computer that behaves, talks and reacts like a real person would imply that the computer is able to not only produce and understand messages transmitted auditorily through the voice, but also could rely on the perception and generation of different forms of body language, such as facial expressions, gestures or body posture. At the time of writing, developments of this next step in human-computer interaction are in full swing, but the type of such interactions is still rather constrained when compared to the way humans have their exchanges with other humans. It is interesting to reflect on how such future humanmachine interactions may look like. When we consider other products that have been created in history, it sometimes is striking to see that some of these have been inspired by things that can be observed in our environment, yet at the same do not have to be exact copies of those phenomena. For instance, an airplane has wings just as birds, yet the wings of an airplane do not make those typical movements a bird would produce to fly. Moreover, an airplane has wheels, whereas a bird has legs. At the same time, an airplane has made it possible for a humans to cover long distances in a fast and smooth manner in a way that was unthinkable before it was invented. The example of the airplane shows how new technologies can have "unnatural" properties, but can nonetheless be very beneficial and impactful for human beings. This dissertation centers on this practical question of how virtual humans can be programmed to act more human-like. The four studies presented in this dissertation all have the equivalent underlying question of how parts of human behavior can be captured, such that computers can use it to become more human-like. Each study differs in method, perspective and specific questions, but they are all aimed to gain insights and directions that would help further push the computer developments of human-like behavior and investigate (the simulation of) human conversational behavior. The rest of this introductory chapter gives a general overview of virtual humans (also known as embodied conversational agents), their potential uses and the engineering challenges, followed by an overview of the four studies

    Building Embodied Conversational Agents:Observations on human nonverbal behaviour as a resource for the development of artificial characters

    Get PDF
    "Wow this is so cool!" This is what I most probably yelled, back in the 90s, when my first computer program on our MSX computer turned out to do exactly what I wanted it to do. The program contained the following instruction: COLOR 10(1.1) After hitting enter, it would change the screen color from light blue to dark yellow. A few years after that experience, Microsoft Windows was introduced. Windows came with an intuitive graphical user interface that was designed to allow all people, so also those who would not consider themselves to be experienced computer addicts, to interact with the computer. This was a major step forward in human-computer interaction, as from that point forward no complex programming skills were required anymore to perform such actions as adapting the screen color. Changing the background was just a matter of pointing the mouse to the desired color on a color palette. "Wow this is so cool!". This is what I shouted, again, 20 years later. This time my new smartphone successfully skipped to the next song on Spotify because I literally told my smartphone, with my voice, to do so. Being able to operate your smartphone with natural language through voice-control can be extremely handy, for instance when listening to music while showering. Again, the option to handle a computer with voice instructions turned out to be a significant optimization in human-computer interaction. From now on, computers could be instructed without the use of a screen, mouse or keyboard, and instead could operate successfully simply by telling the machine what to do. In other words, I have personally witnessed how, within only a few decades, the way people interact with computers has changed drastically, starting as a rather technical and abstract enterprise to becoming something that was both natural and intuitive, and did not require any advanced computer background. Accordingly, while computers used to be machines that could only be operated by technically-oriented individuals, they had gradually changed into devices that are part of many people’s household, just as much as a television, a vacuum cleaner or a microwave oven. The introduction of voice control is a significant feature of the newer generation of interfaces in the sense that these have become more "antropomorphic" and try to mimic the way people interact in daily life, where indeed the voice is a universally used device that humans exploit in their exchanges with others. The question then arises whether it would be possible to go even one step further, where people, like in science-fiction movies, interact with avatars or humanoid robots, whereby users can have a proper conversation with a computer-simulated human that is indistinguishable from a real human. An interaction with a human-like representation of a computer that behaves, talks and reacts like a real person would imply that the computer is able to not only produce and understand messages transmitted auditorily through the voice, but also could rely on the perception and generation of different forms of body language, such as facial expressions, gestures or body posture. At the time of writing, developments of this next step in human-computer interaction are in full swing, but the type of such interactions is still rather constrained when compared to the way humans have their exchanges with other humans. It is interesting to reflect on how such future humanmachine interactions may look like. When we consider other products that have been created in history, it sometimes is striking to see that some of these have been inspired by things that can be observed in our environment, yet at the same do not have to be exact copies of those phenomena. For instance, an airplane has wings just as birds, yet the wings of an airplane do not make those typical movements a bird would produce to fly. Moreover, an airplane has wheels, whereas a bird has legs. At the same time, an airplane has made it possible for a humans to cover long distances in a fast and smooth manner in a way that was unthinkable before it was invented. The example of the airplane shows how new technologies can have "unnatural" properties, but can nonetheless be very beneficial and impactful for human beings. This dissertation centers on this practical question of how virtual humans can be programmed to act more human-like. The four studies presented in this dissertation all have the equivalent underlying question of how parts of human behavior can be captured, such that computers can use it to become more human-like. Each study differs in method, perspective and specific questions, but they are all aimed to gain insights and directions that would help further push the computer developments of human-like behavior and investigate (the simulation of) human conversational behavior. The rest of this introductory chapter gives a general overview of virtual humans (also known as embodied conversational agents), their potential uses and the engineering challenges, followed by an overview of the four studies
    • …
    corecore