2,701 research outputs found

    3D Human Face Reconstruction and 2D Appearance Synthesis

    Get PDF
    3D human face reconstruction has been an extensive research for decades due to its wide applications, such as animation, recognition and 3D-driven appearance synthesis. Although commodity depth sensors are widely available in recent years, image based face reconstruction are significantly valuable as images are much easier to access and store. In this dissertation, we first propose three image-based face reconstruction approaches according to different assumption of inputs. In the first approach, face geometry is extracted from multiple key frames of a video sequence with different head poses. The camera should be calibrated under this assumption. As the first approach is limited to videos, we propose the second approach then focus on single image. This approach also improves the geometry by adding fine grains using shading cue. We proposed a novel albedo estimation and linear optimization algorithm in this approach. In the third approach, we further loose the constraint of the input image to arbitrary in the wild images. Our proposed approach can robustly reconstruct high quality model even with extreme expressions and large poses. We then explore the applicability of our face reconstructions on four interesting applications: video face beautification, generating personalized facial blendshape from image sequences, face video stylizing and video face replacement. We demonstrate great potentials of our reconstruction approaches on these real-world applications. In particular, with the recent surge of interests in VR/AR, it is increasingly common to see people wearing head-mounted displays. However, the large occlusion on face is a big obstacle for people to communicate in a face-to-face manner. Our another application is that we explore hardware/software solutions for synthesizing the face image with presence of HMDs. We design two setups (experimental and mobile) which integrate two near IR cameras and one color camera to solve this problem. With our algorithm and prototype, we can achieve photo-realistic results. We further propose a deep neutral network to solve the HMD removal problem considering it as a face inpainting problem. This approach doesn\u27t need special hardware and run in real-time with satisfying results

    Multimodal Observation and Interpretation of Subjects Engaged in Problem Solving

    Get PDF
    In this paper we present the first results of a pilot experiment in the capture and interpretation of multimodal signals of human experts engaged in solving challenging chess problems. Our goal is to investigate the extent to which observations of eye-gaze, posture, emotion and other physiological signals can be used to model the cognitive state of subjects, and to explore the integration of multiple sensor modalities to improve the reliability of detection of human displays of awareness and emotion. We observed chess players engaged in problems of increasing difficulty while recording their behavior. Such recordings can be used to estimate a participant's awareness of the current situation and to predict ability to respond effectively to challenging situations. Results show that a multimodal approach is more accurate than a unimodal one. By combining body posture, visual attention and emotion, the multimodal approach can reach up to 93% of accuracy when determining player's chess expertise while unimodal approach reaches 86%. Finally this experiment validates the use of our equipment as a general and reproducible tool for the study of participants engaged in screen-based interaction and/or problem solving

    In the Blink of an Eye: Event-based Emotion Recognition

    Full text link
    We introduce a wearable single-eye emotion recognition device and a real-time approach to recognizing emotions from partial observations of an emotion that is robust to changes in lighting conditions. At the heart of our method is a bio-inspired event-based camera setup and a newly designed lightweight Spiking Eye Emotion Network (SEEN). Compared to conventional cameras, event-based cameras offer a higher dynamic range (up to 140 dB vs. 80 dB) and a higher temporal resolution. Thus, the captured events can encode rich temporal cues under challenging lighting conditions. However, these events lack texture information, posing problems in decoding temporal information effectively. SEEN tackles this issue from two different perspectives. First, we adopt convolutional spiking layers to take advantage of the spiking neural network's ability to decode pertinent temporal information. Second, SEEN learns to extract essential spatial cues from corresponding intensity frames and leverages a novel weight-copy scheme to convey spatial attention to the convolutional spiking layers during training and inference. We extensively validate and demonstrate the effectiveness of our approach on a specially collected Single-eye Event-based Emotion (SEE) dataset. To the best of our knowledge, our method is the first eye-based emotion recognition method that leverages event-based cameras and spiking neural network

    Crossing the Uncanny Valley? Understanding Affinity, Trustworthiness, and Preference for More Realistic Virtual Humans in Immersive Environments

    Get PDF
    Developers have long strived to create virtual avatars that are more realistic because they are believed to be preferred over less realistic avatars; however, an “Uncanny Valley” exists in which avatars that are almost but not quite realistic trigger aversion. We used a field study to investigate whether users had different affinity, trustworthiness, and preferences for avatars with two levels of realism, one photo-realistic and one a cartoon caricature. We collected survey data and conducted one-on-one interviews with SIGGRAPH conference attendees who watched a live interview carried out utilizing two avatars, either on a large screen 2D video display or via 3D VR headsets. 18 sessions were conducted over four days, with the same person animating the photo realistic avatar but with different individuals animating the caricature avatars. Participants rated the photo-realistic avatar more trustworthy, had more affinity for it, and preferred it as a virtual agent. Participants who observed the interview through VR headsets had even stronger affinity for the photo-realistic avatar and stronger preferences for it as a virtual agent. Interviews further surprisingly suggested that our ability to cross the Uncanny Valley may depend on who controls the avatar, a human or a virtual agent

    Chapter From the Lab to the Real World: Affect Recognition Using Multiple Cues and Modalities

    Get PDF
    Interdisciplinary concept of dissipative soliton is unfolded in connection with ultrafast fibre lasers. The different mode-locking techniques as well as experimental realizations of dissipative soliton fibre lasers are surveyed briefly with an emphasis on their energy scalability. Basic topics of the dissipative soliton theory are elucidated in connection with concepts of energy scalability and stability. It is shown that the parametric space of dissipative soliton has reduced dimension and comparatively simple structure that simplifies the analysis and optimization of ultrafast fibre lasers. The main destabilization scenarios are described and the limits of energy scalability are connected with impact of optical turbulence and stimulated Raman scattering. The fast and slow dynamics of vector dissipative solitons are exposed

    Sensing, interpreting, and anticipating human social behaviour in the real world

    Get PDF
    Low-level nonverbal social signals like glances, utterances, facial expressions and body language are central to human communicative situations and have been shown to be connected to important high-level constructs, such as emotions, turn-taking, rapport, or leadership. A prerequisite for the creation of social machines that are able to support humans in e.g. education, psychotherapy, or human resources is the ability to automatically sense, interpret, and anticipate human nonverbal behaviour. While promising results have been shown in controlled settings, automatically analysing unconstrained situations, e.g. in daily-life settings, remains challenging. Furthermore, anticipation of nonverbal behaviour in social situations is still largely unexplored. The goal of this thesis is to move closer to the vision of social machines in the real world. It makes fundamental contributions along the three dimensions of sensing, interpreting and anticipating nonverbal behaviour in social interactions. First, robust recognition of low-level nonverbal behaviour lays the groundwork for all further analysis steps. Advancing human visual behaviour sensing is especially relevant as the current state of the art is still not satisfactory in many daily-life situations. While many social interactions take place in groups, current methods for unsupervised eye contact detection can only handle dyadic interactions. We propose a novel unsupervised method for multi-person eye contact detection by exploiting the connection between gaze and speaking turns. Furthermore, we make use of mobile device engagement to address the problem of calibration drift that occurs in daily-life usage of mobile eye trackers. Second, we improve the interpretation of social signals in terms of higher level social behaviours. In particular, we propose the first dataset and method for emotion recognition from bodily expressions of freely moving, unaugmented dyads. Furthermore, we are the first to study low rapport detection in group interactions, as well as investigating a cross-dataset evaluation setting for the emergent leadership detection task. Third, human visual behaviour is special because it functions as a social signal and also determines what a person is seeing at a given moment in time. Being able to anticipate human gaze opens up the possibility for machines to more seamlessly share attention with humans, or to intervene in a timely manner if humans are about to overlook important aspects of the environment. We are the first to propose methods for the anticipation of eye contact in dyadic conversations, as well as in the context of mobile device interactions during daily life, thereby paving the way for interfaces that are able to proactively intervene and support interacting humans.Blick, GesichtsausdrĂŒcke, Körpersprache, oder Prosodie spielen als nonverbale Signale eine zentrale Rolle in menschlicher Kommunikation. Sie wurden durch vielzĂ€hlige Studien mit wichtigen Konzepten wie Emotionen, Sprecherwechsel, FĂŒhrung, oder der QualitĂ€t des VerhĂ€ltnisses zwischen zwei Personen in Verbindung gebracht. Damit Menschen effektiv wĂ€hrend ihres tĂ€glichen sozialen Lebens von Maschinen unterstĂŒtzt werden können, sind automatische Methoden zur Erkennung, Interpretation, und Antizipation von nonverbalem Verhalten notwendig. Obwohl die bisherige Forschung in kontrollierten Studien zu ermutigenden Ergebnissen gekommen ist, bleibt die automatische Analyse nonverbalen Verhaltens in weniger kontrollierten Situationen eine Herausforderung. DarĂŒber hinaus existieren kaum Untersuchungen zur Antizipation von nonverbalem Verhalten in sozialen Situationen. Das Ziel dieser Arbeit ist, die Vision vom automatischen Verstehen sozialer Situationen ein StĂŒck weit mehr RealitĂ€t werden zu lassen. Diese Arbeit liefert wichtige BeitrĂ€ge zur autmatischen Erkennung menschlichen Blickverhaltens in alltĂ€glichen Situationen. Obwohl viele soziale Interaktionen in Gruppen stattfinden, existieren unĂŒberwachte Methoden zur Augenkontakterkennung bisher lediglich fĂŒr dyadische Interaktionen. Wir stellen einen neuen Ansatz zur Augenkontakterkennung in Gruppen vor, welcher ohne manuelle Annotationen auskommt, indem er sich den statistischen Zusammenhang zwischen Blick- und Sprechverhalten zu Nutze macht. TĂ€gliche AktivitĂ€ten sind eine Herausforderung fĂŒr GerĂ€te zur mobile Augenbewegungsmessung, da Verschiebungen dieser GerĂ€te zur Verschlechterung ihrer Kalibrierung fĂŒhren können. In dieser Arbeit verwenden wir Nutzerverhalten an mobilen EndgerĂ€ten, um den Effekt solcher Verschiebungen zu korrigieren. Neben der Erkennung verbessert diese Arbeit auch die Interpretation sozialer Signale. Wir veröffentlichen den ersten Datensatz sowie die erste Methode zur Emotionserkennung in dyadischen Interaktionen ohne den Einsatz spezialisierter AusrĂŒstung. Außerdem stellen wir die erste Studie zur automatischen Erkennung mangelnder Verbundenheit in Gruppeninteraktionen vor, und fĂŒhren die erste datensatzĂŒbergreifende Evaluierung zur Detektion von sich entwickelndem FĂŒhrungsverhalten durch. Zum Abschluss der Arbeit prĂ€sentieren wir die ersten AnsĂ€tze zur Antizipation von Blickverhalten in sozialen Interaktionen. Blickverhalten hat die besondere Eigenschaft, dass es sowohl als soziales Signal als auch der Ausrichtung der visuellen Wahrnehmung dient. Somit eröffnet die FĂ€higkeit zur Antizipation von Blickverhalten Maschinen die Möglichkeit, sich sowohl nahtloser in soziale Interaktionen einzufĂŒgen, als auch Menschen zu warnen, wenn diese Gefahr laufen wichtige Aspekte der Umgebung zu ĂŒbersehen. Wir prĂ€sentieren Methoden zur Antizipation von Blickverhalten im Kontext der Interaktion mit mobilen EndgerĂ€ten wĂ€hrend tĂ€glicher AktivitĂ€ten, als auch wĂ€hrend dyadischer Interaktionen mittels Videotelefonie

    To Affinity and Beyond: Interactive Digital Humans as a Human Computer Interface

    Get PDF
    The field of human computer interaction is increasingly exploring the use of more natural, human-like user interfaces to build intelligent agents to aid in everyday life. This is coupled with a move to people using ever more realistic avatars to represent themselves in their digital lives. As the ability to produce emotionally engaging digital human representations is only just now becoming technically possible, there is little research into how to approach such tasks. This is due to both technical complexity and operational implementation cost. This is now changing as we are at a nexus point with new approaches, faster graphics processing and enabling new technologies in machine learning and computer vision becoming available. I articulate the issues required for such digital humans to be considered successfully located on the other side of the phenomenon known as the Uncanny Valley. My results show that a complex mix of perceived and contextual aspects affect the sense making on digital humans and highlights previously undocumented effects of interactivity on the affinity. Users are willing to accept digital humans as a new form of user interface and they react to them emotionally in previously unanticipated ways. My research shows that it is possible to build an effective interactive digital human that crosses the Uncanny Valley. I directly explore what is required to build a visually realistic digital human as a primary research question and I explore if such a realistic face provides sufficient benefit to justify the challenges involved in building it. I conducted a Delphi study to inform the research approaches and then produced a complex digital human character based on these insights. This interactive and realistic digital human avatar represents a major technical undertaking involving multiple teams around the world. Finally, I explored a framework for examining the ethical implications and signpost future research areas

    On driver behavior recognition for increased safety:A roadmap

    Get PDF
    Advanced Driver-Assistance Systems (ADASs) are used for increasing safety in the automotive domain, yet current ADASs notably operate without taking into account drivers’ states, e.g., whether she/he is emotionally apt to drive. In this paper, we first review the state-of-the-art of emotional and cognitive analysis for ADAS: we consider psychological models, the sensors needed for capturing physiological signals, and the typical algorithms used for human emotion classification. Our investigation highlights a lack of advanced Driver Monitoring Systems (DMSs) for ADASs, which could increase driving quality and security for both drivers and passengers. We then provide our view on a novel perception architecture for driver monitoring, built around the concept of Driver Complex State (DCS). DCS relies on multiple non-obtrusive sensors and Artificial Intelligence (AI) for uncovering the driver state and uses it to implement innovative Human–Machine Interface (HMI) functionalities. This concept will be implemented and validated in the recently EU-funded NextPerception project, which is briefly introduced

    Development and evaluation of an interactive virtual audience for a public speaking training application

    Get PDF
    Einleitung: Eine der hĂ€ufigsten sozialen Ängste ist die Angst vor öffentlichem Sprechen. Virtual-Reality- (VR-) Trainingsanwendungen sind ein vielversprechendes Instrument, um die Sprechangst zu reduzieren und die individuellen SprachfĂ€higkeiten zu verbessern. Grundvoraussetzung hierfĂŒr ist die Implementierung eines realistischen und interaktiven Sprecher-Publikum-Verhaltens. Ziel: Die Studie zielte darauf ab, ein realistisches und interaktives Publikum fĂŒr eine VR-Anwendung zu entwickeln und zu bewerten, welches fĂŒr die Trainingsanwendung von öffentlichem Sprechen angewendet wird. ZunĂ€chst wurde eine Beobachtungsstudie zu den Verhaltensmustern von Sprecher und Publikum durchgefĂŒhrt. Anschließend wurden die identifizierten Muster in eine VR-Anwendung implementiert. Die Wahrnehmung der implementierten Interaktionsmuster wurde in einer weiteren Studie aus Sicht der Nutzer evaluiert. Beobachtungsstudie (1): Aufgrund der nicht ausreichenden Datengrundlage zum realen interaktiven Verhalten zwischen Sprecher und Publikum lautet die erste Forschungsfrage "Welche Sprecher-Publikums-Interaktionsmuster können im realen Umfeld identifiziert werden?". Es wurde eine strukturierte, nicht teilnehmende, offene Beobachtungsstudie durchgefĂŒhrt. Ein reales Publikum wurde auf Video aufgezeichnet und die Inhalte analysiert. Die Stichprobe ergab N = 6484 beobachtete Interaktionsmuster. Es wurde festgestellt, dass Sprecher mehr Dialoge als das Publikum initiieren und wie die Zuschauer auf GesichtsausdrĂŒcke und Gesten der Sprecher reagieren. Implementierungsstudie (2): Um effiziente Wege zur Implementierung der Ergebnisse der Beobachtungsstudie in die Trainingsanwendung zu finden, wurde die Forschungsfrage wie folgt formuliert: "Wie können Interaktionsmuster zwischen Sprecher und Publikum in eine virtuelle Anwendung implementiert werden?". Das Hardware-Setup bestand aus einer CAVE, Infitec-Brille und einem ART Head-Tracking. Die Software wurde mit 3D-Excite RTT DeltaGen 12.2 realisiert. Zur Beantwortung der zweiten Forschungsfrage wurden mehrere mögliche technische Lösungen systematisch untersucht, bis effiziente Lösungen gefunden wurden. Infolgedessen wurden die selbst erstellte Audioerkennung, die Kinect-Bewegungserkennung, die Affectiva-Gesichtserkennung und die selbst erstellten Fragen implementiert, um das interaktive Verhalten des Publikums in der Trainingsanwendung fĂŒr öffentliches Sprechen zu realisieren. Evaluationsstudie (3): Um herauszufinden, ob die Implementierung interaktiver Verhaltensmuster den Erwartungen der Benutzer entsprach, wurde die dritte Forschungsfrage folgendermaßen formuliert: “Wie beeinflusst die InteraktivitĂ€t einer virtuellen Anwendung fĂŒr öffentliches Reden die Benutzererfahrung?”. Eine experimentelle Benutzer-Querschnittsstudie wurde mit N = 57 Teilnehmerinnen (65% MĂ€nner, 35% Frauen; Durchschnittsalter = 25.98, SD = 4.68) durchgefĂŒhrt, die entweder der interaktiven oder nicht-interaktiven VR-Anwendung zugewiesen wurden. Die Ergebnisse zeigten, dass, es einen signifikanten Unterschied in der Wahrnehmung zwischen den beiden Anwendungen gab. Allgemeine Schlussfolgerungen: Interaktionsmuster zwischen Sprecher und Publikum, die im wirklichen Leben beobachtet werden können, wurden in eine VR-Anwendung integriert, die Menschen dabei hilft, Angst vor dem öffentlichen Sprechen zu ĂŒberwinden und ihre öffentlichen SprechfĂ€higkeiten zu trainieren. Die Ergebnisse zeigten eine hohe Relevanz der VR-Anwendungen fĂŒr die Simulation öffentlichen Sprechens. Obwohl die Fragen des Publikums manuell gesteuert wurden, konnte das neu gestaltete Publikum mit den Versuchspersonen interagieren. Die vorgestellte VR-Anwendung zeigt daher einen hohen potenziellen Nutzen, Menschen beim Trainieren von SprechfĂ€higkeiten zu unterstĂŒtzen. Die Fragen des Publikums wurden immer noch manuell von einem Bediener reguliert und die Studie wurde mit Teilnehmern durchgefĂŒhrt, die nicht unter einem hohen Grad an Angst vor öffentlichem Sprechen leiden. Bei zukĂŒnftigen Studien sollten fortschrittlichere Technologien eingesetzt werden, beispielsweise Spracherkennung, 3D-Aufzeichnungen oder 3D-Livestreams einer realen Person und auch Teilnehmer mit einem hohen Grad an Angst vor öffentlichen Ansprachen beziehungsweise Sprechen in der Öffentlichkeit.Introduction: Fear of public speaking is the most common social fear. Virtual reality (VR) training applications are a promising tool to improve public speaking skills. To be successful, applications should feature a high scenario fidelity. One way to improve it is to implement realistic speaker-audience interactive behavior. Objective: The study aimed to develop and evaluate a realistic and interactive audience for a VR public speaking training application. First, an observation study on real speaker-audience interactive behavior patterns was conducted. Second, identified patterns were implemented in the VR application. Finally, an evaluation study identified users’ perceptions of the training application. Observation Study (1): Because of the lack of data on real speaker-audience interactive behavior, the first research question to be answered was “What speaker-audience interaction patterns can be identified in real life?”. A structured, non-participant, overt observation study was conducted. A real audience was video recorded, and content analyzed. The sample resulted in N = 6,484 observed interaction patterns. It was found that speakers, more often than audience members, initiate dialogues and how audience members react to speakers’ facial expressions and gestures. Implementation Study (2): To find efficient ways of implementing the results of the observation study in the training application, the second research question was formulated as: “How can speaker-audience interaction patterns be implemented into the virtual public speaking application?”. The hardware setup comprised a CAVE, Infitec glasses, and ART head tracking. The software was realized with 3D-Excite RTT DeltaGen 12.2. To answer the second research question, several possible technical solutions were explored systematically, until efficient solutions were found. As a result, self-created audio recognition, Kinect motion recognition, Affectiva facial recognition, and manual question generation were implemented to provide interactive audience behavior in the public speaking training application. Evaluation Study (3): To find out if implementing interactive behavior patterns met users’ expectations, the third research question was formulated as “How does interactivity of a virtual public speaking application affect user experience?”. An experimental, cross-sectional user study was conducted with (N = 57) participants (65% men, 35% women; Mage = 25.98, SD = 4.68) who used either an interactive or a non-interactive VR application condition. Results revealed that there was a significant difference in users’ perception of the two conditions. General Conclusions: Speaker-audience interaction patterns that can be observed in real life were incorporated into a VR application that helps people to overcome the fear of public speaking and train their public speaking skills. The findings showed a high relevance of interactivity for VR public speaking applications. Although questions from the audience were still regulated manually, the newly designed audience could interact with the speakers. Thus, the presented VR application is of potential value in helping people to train their public speaking skills. The questions from the audience were still regulated manually by an operator and we conducted the study with participants not suffering from high degrees of public speaking fear. Future work may use more advanced technology, such as speech recognition, 3D-records, or live 3D-streams of an actual person and include participants with high degrees of public speaking fear
    • 

    corecore