10,526 research outputs found

    Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss

    Full text link
    We devise a cascade GAN approach to generate talking face video, which is robust to different face shapes, view angles, facial characteristics, and noisy audio conditions. Instead of learning a direct mapping from audio to video frames, we propose first to transfer audio to high-level structure, i.e., the facial landmarks, and then to generate video frames conditioned on the landmarks. Compared to a direct audio-to-image approach, our cascade approach avoids fitting spurious correlations between audiovisual signals that are irrelevant to the speech content. We, humans, are sensitive to temporal discontinuities and subtle artifacts in video. To avoid those pixel jittering problems and to enforce the network to focus on audiovisual-correlated regions, we propose a novel dynamically adjustable pixel-wise loss with an attention mechanism. Furthermore, to generate a sharper image with well-synchronized facial movements, we propose a novel regression-based discriminator structure, which considers sequence-level information along with frame-level information. Thoughtful experiments on several datasets and real-world samples demonstrate significantly better results obtained by our method than the state-of-the-art methods in both quantitative and qualitative comparisons

    Feasibility of a smartphone application to identify young children at risk for Autism Spectrum Disorder in a low-income community setting in South Africa

    Get PDF
    Introduction and aims More than 90% of children with Autism Spectrum Disorder (ASD) live in low- and middle-income countries (LMIC) where there is a great need for culturally appropriate, scalable and effective early identification and intervention tools. Smartphone technology and application (‘apps’) may potentially play an important role in this regard. The Autism&Beyond iPhone App was designed as a potential screening tool for ASD risk in children aged 12-72 months. Here we investigated the technical feasibility and cultural acceptability of a smartphone app to determine risk for ASD in children aged 12-72 months in a naturalistic, low-income South African community setting. Methodology 37 typically-developing African children and their parents/carers were recruited from community centres in Khayelitsha Township, Cape Town, South Africa. We implemented a mixed-methods design, collecting both quantitative and qualitative data from participants in 2 stages. In stage 1, we collected quantitative data. With appropriate ethics and consent, parents completed a short technology questionnaire about their familiarity with and access to smartphones, internet and apps, followed by electronic iPhone-based demographic and ASD-related questionnaires. Next, children were shown 3 short videos of 30s each and a mirror stimulus on a study smartphone. The smartphone front facing (“selfie”) camera recorded video of the child’s facial expressions and head movement. Automated computer algorithms quantified positive emotions and time attending to stimuli. We validated the automatic coding by a) comparing the computer-generated analysis to human coding of facial expressions in a random sample (N=9), and b) comparing automated analysis of the South African data (N=33) with a matched American sample (N=33). In stage 2, a subset of families were invited to participate in focus group discussions to provide qualitative data on accessibility, acceptability, and cultural appropriateness of the app in their local community. Results Most parents (64%) owned a smartphone of which all (100%) were Android based, and many used Apps (45%). Human-automated coding showed excellent correlation for positive emotion (ICC= 0.95, 95% CI 0.81-0.99) and no statistically significant differences were observed between the South African and American sample in % time attending to the video stimuli. South African children, however, smiled less at the Toys&Rhymes (SA mean (SD) = 14% (24); USA mean (SD) = 31% (34); p=0.05) and Bunny video (SA mean (SD) = 12% (17); USA mean (SD) = 30% (0.27); p=0.006). Analysis of focus group data indicated that parents/carers found the App relatively easy to use, and would recommend it to others in their community provided the App and data transfer were free. Conclusion The results from this pilot study suggested the App to be technically accurate, accessible and culturally acceptable to families from a low-resource environment in South Africa. Given the differences in positive emotional response between the groups, careful consideration should be given to identify suitable stimuli if % time smiling is to be used as a global marker for autism risk across cultures and environments

    Development and evaluation of an interactive virtual audience for a public speaking training application

    Get PDF
    Einleitung: Eine der häufigsten sozialen Ängste ist die Angst vor öffentlichem Sprechen. Virtual-Reality- (VR-) Trainingsanwendungen sind ein vielversprechendes Instrument, um die Sprechangst zu reduzieren und die individuellen Sprachfähigkeiten zu verbessern. Grundvoraussetzung hierfür ist die Implementierung eines realistischen und interaktiven Sprecher-Publikum-Verhaltens. Ziel: Die Studie zielte darauf ab, ein realistisches und interaktives Publikum für eine VR-Anwendung zu entwickeln und zu bewerten, welches für die Trainingsanwendung von öffentlichem Sprechen angewendet wird. Zunächst wurde eine Beobachtungsstudie zu den Verhaltensmustern von Sprecher und Publikum durchgeführt. Anschließend wurden die identifizierten Muster in eine VR-Anwendung implementiert. Die Wahrnehmung der implementierten Interaktionsmuster wurde in einer weiteren Studie aus Sicht der Nutzer evaluiert. Beobachtungsstudie (1): Aufgrund der nicht ausreichenden Datengrundlage zum realen interaktiven Verhalten zwischen Sprecher und Publikum lautet die erste Forschungsfrage "Welche Sprecher-Publikums-Interaktionsmuster können im realen Umfeld identifiziert werden?". Es wurde eine strukturierte, nicht teilnehmende, offene Beobachtungsstudie durchgeführt. Ein reales Publikum wurde auf Video aufgezeichnet und die Inhalte analysiert. Die Stichprobe ergab N = 6484 beobachtete Interaktionsmuster. Es wurde festgestellt, dass Sprecher mehr Dialoge als das Publikum initiieren und wie die Zuschauer auf Gesichtsausdrücke und Gesten der Sprecher reagieren. Implementierungsstudie (2): Um effiziente Wege zur Implementierung der Ergebnisse der Beobachtungsstudie in die Trainingsanwendung zu finden, wurde die Forschungsfrage wie folgt formuliert: "Wie können Interaktionsmuster zwischen Sprecher und Publikum in eine virtuelle Anwendung implementiert werden?". Das Hardware-Setup bestand aus einer CAVE, Infitec-Brille und einem ART Head-Tracking. Die Software wurde mit 3D-Excite RTT DeltaGen 12.2 realisiert. Zur Beantwortung der zweiten Forschungsfrage wurden mehrere mögliche technische Lösungen systematisch untersucht, bis effiziente Lösungen gefunden wurden. Infolgedessen wurden die selbst erstellte Audioerkennung, die Kinect-Bewegungserkennung, die Affectiva-Gesichtserkennung und die selbst erstellten Fragen implementiert, um das interaktive Verhalten des Publikums in der Trainingsanwendung für öffentliches Sprechen zu realisieren. Evaluationsstudie (3): Um herauszufinden, ob die Implementierung interaktiver Verhaltensmuster den Erwartungen der Benutzer entsprach, wurde die dritte Forschungsfrage folgendermaßen formuliert: “Wie beeinflusst die Interaktivität einer virtuellen Anwendung für öffentliches Reden die Benutzererfahrung?”. Eine experimentelle Benutzer-Querschnittsstudie wurde mit N = 57 Teilnehmerinnen (65% Männer, 35% Frauen; Durchschnittsalter = 25.98, SD = 4.68) durchgeführt, die entweder der interaktiven oder nicht-interaktiven VR-Anwendung zugewiesen wurden. Die Ergebnisse zeigten, dass, es einen signifikanten Unterschied in der Wahrnehmung zwischen den beiden Anwendungen gab. Allgemeine Schlussfolgerungen: Interaktionsmuster zwischen Sprecher und Publikum, die im wirklichen Leben beobachtet werden können, wurden in eine VR-Anwendung integriert, die Menschen dabei hilft, Angst vor dem öffentlichen Sprechen zu überwinden und ihre öffentlichen Sprechfähigkeiten zu trainieren. Die Ergebnisse zeigten eine hohe Relevanz der VR-Anwendungen für die Simulation öffentlichen Sprechens. Obwohl die Fragen des Publikums manuell gesteuert wurden, konnte das neu gestaltete Publikum mit den Versuchspersonen interagieren. Die vorgestellte VR-Anwendung zeigt daher einen hohen potenziellen Nutzen, Menschen beim Trainieren von Sprechfähigkeiten zu unterstützen. Die Fragen des Publikums wurden immer noch manuell von einem Bediener reguliert und die Studie wurde mit Teilnehmern durchgeführt, die nicht unter einem hohen Grad an Angst vor öffentlichem Sprechen leiden. Bei zukünftigen Studien sollten fortschrittlichere Technologien eingesetzt werden, beispielsweise Spracherkennung, 3D-Aufzeichnungen oder 3D-Livestreams einer realen Person und auch Teilnehmer mit einem hohen Grad an Angst vor öffentlichen Ansprachen beziehungsweise Sprechen in der Öffentlichkeit.Introduction: Fear of public speaking is the most common social fear. Virtual reality (VR) training applications are a promising tool to improve public speaking skills. To be successful, applications should feature a high scenario fidelity. One way to improve it is to implement realistic speaker-audience interactive behavior. Objective: The study aimed to develop and evaluate a realistic and interactive audience for a VR public speaking training application. First, an observation study on real speaker-audience interactive behavior patterns was conducted. Second, identified patterns were implemented in the VR application. Finally, an evaluation study identified users’ perceptions of the training application. Observation Study (1): Because of the lack of data on real speaker-audience interactive behavior, the first research question to be answered was “What speaker-audience interaction patterns can be identified in real life?”. A structured, non-participant, overt observation study was conducted. A real audience was video recorded, and content analyzed. The sample resulted in N = 6,484 observed interaction patterns. It was found that speakers, more often than audience members, initiate dialogues and how audience members react to speakers’ facial expressions and gestures. Implementation Study (2): To find efficient ways of implementing the results of the observation study in the training application, the second research question was formulated as: “How can speaker-audience interaction patterns be implemented into the virtual public speaking application?”. The hardware setup comprised a CAVE, Infitec glasses, and ART head tracking. The software was realized with 3D-Excite RTT DeltaGen 12.2. To answer the second research question, several possible technical solutions were explored systematically, until efficient solutions were found. As a result, self-created audio recognition, Kinect motion recognition, Affectiva facial recognition, and manual question generation were implemented to provide interactive audience behavior in the public speaking training application. Evaluation Study (3): To find out if implementing interactive behavior patterns met users’ expectations, the third research question was formulated as “How does interactivity of a virtual public speaking application affect user experience?”. An experimental, cross-sectional user study was conducted with (N = 57) participants (65% men, 35% women; Mage = 25.98, SD = 4.68) who used either an interactive or a non-interactive VR application condition. Results revealed that there was a significant difference in users’ perception of the two conditions. General Conclusions: Speaker-audience interaction patterns that can be observed in real life were incorporated into a VR application that helps people to overcome the fear of public speaking and train their public speaking skills. The findings showed a high relevance of interactivity for VR public speaking applications. Although questions from the audience were still regulated manually, the newly designed audience could interact with the speakers. Thus, the presented VR application is of potential value in helping people to train their public speaking skills. The questions from the audience were still regulated manually by an operator and we conducted the study with participants not suffering from high degrees of public speaking fear. Future work may use more advanced technology, such as speech recognition, 3D-records, or live 3D-streams of an actual person and include participants with high degrees of public speaking fear

    Novel Techniques to Measure the Sensory, Emotional, and Physiological (Biometric) Responses of Consumers toward Foods and Packaging

    Get PDF
    This book reprinted from articles published in the Special Issue “Novel Techniques to Measure the Sensory, Emotional, and Physiological (Biometric) Responses of Consumers toward Foods and Packaging” of the journal Foods aims to provide a deeper understanding of novel techniques to measure the different sensory, emotional, and physiological responses toward foods. The editor hopes that the findings from this Special Issue can help the broader scientific community to understand the use of novel sensory science techniques that can be used in the evaluation of products

    Biometric features modeling to measure students engagement.

    Get PDF
    The ability to measure students’ engagement in an educational setting may improve student retention and academic success, revealing which students are disinterested, or which segments of a lesson are causing difficulties. This ability will facilitate timely intervention in both the learning and the teaching process in a variety of classroom settings. In this dissertation, an automatic students engagement measure is proposed through investigating three main engagement components of the engagement: the behavioural engagement, the emotional engagement and the cognitive engagement. The main goal of the proposed technology is to provide the instructors with a tool that could help them estimating both the average class engagement level and the individuals engagement levels while they give the lecture in real-time. Such system could help the instructors to take actions to improve students\u27 engagement. Also, it can be used by the instructor to tailor the presentation of material in class, identify course material that engages and disengages with students, and identify students who are engaged or disengaged and at risk of failure. A biometric sensor network (BSN) is designed to capture data consist of individuals facial capture cameras, wall-mounted cameras and high performance computing machine to capture students head pose, eye gaze, body pose, body movements, and facial expressions. These low level features will be used to train a machine-learning model to estimate the behavioural and emotional engagements in either e-learning or in-class environment. A set of experiments is conducted to compare the proposed technology with the state-of-the-art frameworks in terms of performance. The proposed framework shows better accuracy in estimating both behavioral and emotional engagement. Also, it offers superior flexibility to work in any educational environment. Further, this approach allows quantitative comparison of teaching methods, such as lecture, flipped classrooms, classroom response systems, etc. such that an objective metric can be used for teaching evaluation with immediate closed-loop feedback to the instructor

    Sensor Technologies to Manage the Physiological Traits of Chronic Pain: A Review

    Get PDF
    Non-oncologic chronic pain is a common high-morbidity impairment worldwide and acknowledged as a condition with significant incidence on quality of life. Pain intensity is largely perceived as a subjective experience, what makes challenging its objective measurement. However, the physiological traces of pain make possible its correlation with vital signs, such as heart rate variability, skin conductance, electromyogram, etc., or health performance metrics derived from daily activity monitoring or facial expressions, which can be acquired with diverse sensor technologies and multisensory approaches. As the assessment and management of pain are essential issues for a wide range of clinical disorders and treatments, this paper reviews different sensor-based approaches applied to the objective evaluation of non-oncological chronic pain. The space of available technologies and resources aimed at pain assessment represent a diversified set of alternatives that can be exploited to address the multidimensional nature of pain.Ministerio de Economía y Competitividad (Instituto de Salud Carlos III) PI15/00306Junta de Andalucía PIN-0394-2017Unión Europea "FRAIL
    corecore