89 research outputs found

    Brain computer interfaces as intelligent sensors for enhancing human-computer interaction

    Get PDF
    BCIs are traditionally conceived as a way to control apparatus, an interface that allows you to act on" external devices as a form of input control. We propose an alternative use of BCIs, that of monitoring users as an additional intelligent sensor to enrich traditional means of interaction. This vision is what we consider to be a grand challenge in the field of multimodal interaction. In this article, this challenge is introduced, related to existing work, and illustrated using some best practices and the contributions it has received

    ICMI'12:Proceedings of the ACM SIGCHI 14th International Conference on Multimodal Interaction

    Get PDF

    The Multimodal Tutor: Adaptive Feedback from Multimodal Experiences

    Get PDF
    This doctoral thesis describes the journey of ideation, prototyping and empirical testing of the Multimodal Tutor, a system designed for providing digital feedback that supports psychomotor skills acquisition using learning and multimodal data capturing. The feedback is given in real-time with machine-driven assessment of the learner's task execution. The predictions are tailored by supervised machine learning models trained with human annotated samples. The main contributions of this thesis are: a literature survey on multimodal data for learning, a conceptual model (the Multimodal Learning Analytics Model), a technological framework (the Multimodal Pipeline), a data annotation tool (the Visual Inspection Tool) and a case study in Cardiopulmonary Resuscitation training (CPR Tutor). The CPR Tutor generates real-time, adaptive feedback using kinematic and myographic data and neural networks

    Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

    Full text link
    Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. With the recent interest in video understanding, embodied autonomous agents, text-to-image generation, and multisensor fusion in application domains such as healthcare and robotics, multimodal machine learning has brought unique computational and theoretical challenges to the machine learning community given the heterogeneity of data sources and the interconnections often found between modalities. However, the breadth of progress in multimodal research has made it difficult to identify the common themes and open questions in the field. By synthesizing a broad range of application domains and theoretical frameworks from both historical and recent perspectives, this paper is designed to provide an overview of the computational and theoretical foundations of multimodal machine learning. We start by defining two key principles of modality heterogeneity and interconnections that have driven subsequent innovations, and propose a taxonomy of 6 core technical challenges: representation, alignment, reasoning, generation, transference, and quantification covering historical and recent trends. Recent technical achievements will be presented through the lens of this taxonomy, allowing researchers to understand the similarities and differences across new approaches. We end by motivating several open problems for future research as identified by our taxonomy

    Multimodal Visual Sensing: Automated Estimation of Engagement

    Get PDF
    Viele moderne Anwendungen der künstlichen Intelligenz beinhalten bis zu einem gewissen Grad ein Verständnis der menschlichen Aufmerksamkeit, Aktivität, Absicht und Kompetenz aus multimodalen visuellen Daten. Nonverbale Verhaltenshinweise, die mit Hilfe von Computer Vision und Methoden des maschinellen Lernens erkannt werden, enthalten wertvolle Informationen zum Verständnis menschlicher Verhaltensweisen, einschließlich Aufmerksamkeit und Engagement. Der Einsatz solcher automatisierten Methoden im Bildungsbereich birgt ein enormes Potenzial. Zu den nützlichen Anwendungen gehören Analysen im Klassenzimmer zur Messung der Unterrichtsqualität und die Entwicklung von Interventionen zur Verbesserung des Unterrichts auf der Grundlage dieser Analysen sowie die Analyse von Präsentationen, um Studenten zu helfen, ihre Botschaften überzeugend und effektiv zu vermitteln. Diese Dissertation stellt ein allgemeines Framework vor, das auf multimodaler visueller Erfassung basiert, um Engagement und verwandte Aufgaben anhand visueller Modalitäten zu analysieren. Während sich der Großteil der Engagement-Literatur im Bereich des affektiven und sozialen Computings auf computerbasiertes Lernen und auf Lernspiele konzentriert, untersuchen wir die automatisierte Engagement-Schätzung im Klassenzimmer unter Verwendung verschiedener nonverbaler Verhaltenshinweise und entwickeln Methoden zur Extraktion von Aufmerksamkeits- und emotionalen Merkmalen. Darüber hinaus validieren wir die Effizienz der vorgeschlagenen Ansätze an realen Daten, die aus videografierten Klassen an Universitäten und weiterführenden Schulen gesammelt wurden. Zusätzlich zu den Lernaktivitäten führen wir eine Verhaltensanalyse von Studenten durch, die kurze wissenschaftliche Präsentationen unter Verwendung von multimodalen Hinweisen, einschließlich Gesichts-, Körper- und Stimmmerkmalen, halten. Neben dem Engagement und der Präsentationskompetenz nähern wir uns dem Verständnis des menschlichen Verhaltens aus einer breiteren Perspektive, indem wir die Analyse der gemeinsamen Aufmerksamkeit in einer Gruppe von Menschen, die Wahrnehmung von Lehrern mit Hilfe von egozentrischer Kameraperspektive und mobilen Eyetrackern sowie die automatisierte Anonymisierung von audiovisuellen Daten in Studien im Klassenzimmer untersuchen. Educational Analytics bieten wertvolle Möglichkeiten zur Verbesserung von Lernen und Lehren. Die Arbeit in dieser Dissertation schlägt einen rechnerischen Rahmen zur Einschätzung des Engagements und der Präsentationskompetenz von Schülern vor, zusammen mit unterstützenden Computer-Vision-Problemen.Many modern applications of artificial intelligence involve, to some extent, an understanding of human attention, activity, intention, and competence from multimodal visual data. Nonverbal behavioral cues detected using computer vision and machine learning methods include valuable information for understanding human behaviors, including attention and engagement. The use of such automated methods in educational settings has a tremendous potential for good. Beneficial uses include classroom analytics to measure teaching quality and the development of interventions to improve teaching based on these analytics, as well as presentation analysis to help students deliver their messages persuasively and effectively. This dissertation presents a general framework based on multimodal visual sensing to analyze engagement and related tasks from visual modalities. While the majority of engagement literature in affective and social computing focuses on computer-based learning and educational games, we investigate automated engagement estimation in the classroom using different nonverbal behavioral cues and developed methods to extract attentional and emotional features. Furthermore, we validate the efficiency of proposed approaches on real-world data collected from videotaped classes at university and secondary school. In addition to learning activities, we perform behavior analysis on students giving short scientific presentations using multimodal cues, including face, body, and voice features. Besides engagement and presentation competence, we approach human behavior understanding from a broader perspective by studying the analysis of joint attention in a group of people, teachers' perception using egocentric camera view and mobile eye trackers, and automated anonymization of audiovisual data in classroom studies. Educational analytics present valuable opportunities to improve learning and teaching. The work in this dissertation suggests a computational framework for estimating student engagement and presentation competence, together with supportive computer vision problems

    Affective Computing

    Get PDF
    This book provides an overview of state of the art research in Affective Computing. It presents new ideas, original results and practical experiences in this increasingly important research field. The book consists of 23 chapters categorized into four sections. Since one of the most important means of human communication is facial expression, the first section of this book (Chapters 1 to 7) presents a research on synthesis and recognition of facial expressions. Given that we not only use the face but also body movements to express ourselves, in the second section (Chapters 8 to 11) we present a research on perception and generation of emotional expressions by using full-body motions. The third section of the book (Chapters 12 to 16) presents computational models on emotion, as well as findings from neuroscience research. In the last section of the book (Chapters 17 to 22) we present applications related to affective computing
    • …
    corecore