Search CORE

210 research outputs found

Multi-label and multimodal classifier for affectve states recognition in virtual rehabilitation

Author: Berthouze N
Castrejon L
Franco JH
Lara PC
Orihuela-Espina F
Palafox L
Rivas J
Sucar LE
Williams ACDC
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/01/2021
Field of study

Computational systems that process multiple affective states may benefit from explicitly considering the interaction between the states to enhance their recognition performance. This work proposes the combination of a multi-label classifier, Circular Classifier Chain (CCC), with a multimodal classifier, Fusion using a Semi-Naive Bayesian classifier (FSNBC), to include explicitly the dependencies between multiple affective states during the automatic recognition process. This combination of classifiers is applied to a virtual rehabilitation context of post-stroke patients. We collected data from post-stroke patients, which include finger pressure, hand movements, and facial expressions during ten longitudinal sessions. Videos of the sessions were labelled by clinicians to recognize four states: tiredness, anxiety, pain, and engagement. Each state was modelled by the FSNBC receiving the information of finger pressure, hand movements, and facial expressions. The four FSNBCs were linked in the CCC to exploit the dependency relationships between the states. The convergence of CCC was reached by 5 iterations at most for all the patients. Results (ROC AUC) of CCC with the FSNBC are over 0.940 ± 0.045 (mean ± std. deviation) for the four states. Relationships of mutual exclusion between engagement and all the other states and co-occurrences between pain and anxiety were detected and discussed

UCL Discovery

A Novel Multimodal Approach for Studying the Dynamics of Curiosity in Small Group Learning

Author: Bai Zhen
Cassell Justine
Sinha Tanmay
Publication venue
Publication date: 19/01/2022
Field of study

Curiosity is a vital metacognitive skill in educational contexts, leading to creativity, and a love of learning. And while many school systems increasingly undercut curiosity by teaching to the test, teachers are increasingly interested in how to evoke curiosity in their students to prepare them for a world in which lifelong learning and reskilling will be more and more important. One aspect of curiosity that has received little attention, however, is the role of peers in eliciting curiosity. We present what we believe to be the first theoretical framework that articulates an integrated socio-cognitive account of curiosity that ties observable behaviors in peers to underlying curiosity states. We make a bipartite distinction between individual and interpersonal functions that contribute to curiosity, and multimodal behaviors that fulfill these functions. We validate the proposed framework by leveraging a longitudinal latent variable modeling approach. Findings confirm a positive predictive relationship between the latent variables of individual and interpersonal functions and curiosity, with the interpersonal functions exercising a comparatively stronger influence. Prominent behavioral realizations of these functions are also discovered in a data-driven manner. We instantiate the proposed theoretical framework in a set of strategies and tactics that can be incorporated into learning technologies to indicate, evoke, and scaffold curiosity. This work is a step towards designing learning technologies that can recognize and evoke moment-by-moment curiosity during learning in social contexts and towards a more complete multimodal learning analytics. The underlying rationale is applicable more generally for developing computer support for other metacognitive and socio-emotional skills.Comment: arXiv admin note: text overlap with arXiv:1704.0748

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Real-time generation and adaptation of social companion robot behaviors

Author: Ritschel Hannes
Publication venue
Publication date: 23/01/2023
Field of study

Social robots will be part of our future homes. They will assist us in everyday tasks, entertain us, and provide helpful advice. However, the technology still faces challenges that must be overcome to equip the machine with social competencies and make it a socially intelligent and accepted housemate. An essential skill of every social robot is verbal and non-verbal communication. In contrast to voice assistants, smartphones, and smart home technology, which are already part of many people's lives today, social robots have an embodiment that raises expectations towards the machine. Their anthropomorphic or zoomorphic appearance suggests they can communicate naturally with speech, gestures, or facial expressions and understand corresponding human behaviors. In addition, robots also need to consider individual users' preferences: everybody is shaped by their culture, social norms, and life experiences, resulting in different expectations towards communication with a robot. However, robots do not have human intuition - they must be equipped with the corresponding algorithmic solutions to these problems. This thesis investigates the use of reinforcement learning to adapt the robot's verbal and non-verbal communication to the user's needs and preferences. Such non-functional adaptation of the robot's behaviors primarily aims to improve the user experience and the robot's perceived social intelligence. The literature has not yet provided a holistic view of the overall challenge: real-time adaptation requires control over the robot's multimodal behavior generation, an understanding of human feedback, and an algorithmic basis for machine learning. Thus, this thesis develops a conceptual framework for designing real-time non-functional social robot behavior adaptation with reinforcement learning. It provides a higher-level view from the system designer's perspective and guidance from the start to the end. It illustrates the process of modeling, simulating, and evaluating such adaptation processes. Specifically, it guides the integration of human feedback and social signals to equip the machine with social awareness. The conceptual framework is put into practice for several use cases, resulting in technical proofs of concept and research prototypes. They are evaluated in the lab and in in-situ studies. These approaches address typical activities in domestic environments, focussing on the robot's expression of personality, persona, politeness, and humor. Within this scope, the robot adapts its spoken utterances, prosody, and animations based on human explicit or implicit feedback.Soziale Roboter werden Teil unseres zukünftigen Zuhauses sein. Sie werden uns bei alltäglichen Aufgaben unterstützen, uns unterhalten und uns mit hilfreichen Ratschlägen versorgen. Noch gibt es allerdings technische Herausforderungen, die zunächst überwunden werden müssen, um die Maschine mit sozialen Kompetenzen auszustatten und zu einem sozial intelligenten und akzeptierten Mitbewohner zu machen. Eine wesentliche Fähigkeit eines jeden sozialen Roboters ist die verbale und nonverbale Kommunikation. Im Gegensatz zu Sprachassistenten, Smartphones und Smart-Home-Technologien, die bereits heute Teil des Lebens vieler Menschen sind, haben soziale Roboter eine Verkörperung, die Erwartungen an die Maschine weckt. Ihr anthropomorphes oder zoomorphes Aussehen legt nahe, dass sie in der Lage sind, auf natürliche Weise mit Sprache, Gestik oder Mimik zu kommunizieren, aber auch entsprechende menschliche Kommunikation zu verstehen. Darüber hinaus müssen Roboter auch die individuellen Vorlieben der Benutzer berücksichtigen. So ist jeder Mensch von seiner Kultur, sozialen Normen und eigenen Lebenserfahrungen geprägt, was zu unterschiedlichen Erwartungen an die Kommunikation mit einem Roboter führt. Roboter haben jedoch keine menschliche Intuition - sie müssen mit entsprechenden Algorithmen für diese Probleme ausgestattet werden. In dieser Arbeit wird der Einsatz von bestärkendem Lernen untersucht, um die verbale und nonverbale Kommunikation des Roboters an die Bedürfnisse und Vorlieben des Benutzers anzupassen. Eine solche nicht-funktionale Anpassung des Roboterverhaltens zielt in erster Linie darauf ab, das Benutzererlebnis und die wahrgenommene soziale Intelligenz des Roboters zu verbessern. Die Literatur bietet bisher keine ganzheitliche Sicht auf diese Herausforderung: Echtzeitanpassung erfordert die Kontrolle über die multimodale Verhaltenserzeugung des Roboters, ein Verständnis des menschlichen Feedbacks und eine algorithmische Basis für maschinelles Lernen. Daher wird in dieser Arbeit ein konzeptioneller Rahmen für die Gestaltung von nicht-funktionaler Anpassung der Kommunikation sozialer Roboter mit bestärkendem Lernen entwickelt. Er bietet eine übergeordnete Sichtweise aus der Perspektive des Systemdesigners und eine Anleitung vom Anfang bis zum Ende. Er veranschaulicht den Prozess der Modellierung, Simulation und Evaluierung solcher Anpassungsprozesse. Insbesondere wird auf die Integration von menschlichem Feedback und sozialen Signalen eingegangen, um die Maschine mit sozialem Bewusstsein auszustatten. Der konzeptionelle Rahmen wird für mehrere Anwendungsfälle in die Praxis umgesetzt, was zu technischen Konzeptnachweisen und Forschungsprototypen führt, die in Labor- und In-situ-Studien evaluiert werden. Diese Ansätze befassen sich mit typischen Aktivitäten in häuslichen Umgebungen, wobei der Schwerpunkt auf dem Ausdruck der Persönlichkeit, dem Persona, der Höflichkeit und dem Humor des Roboters liegt. In diesem Rahmen passt der Roboter seine Sprache, Prosodie, und Animationen auf Basis expliziten oder impliziten menschlichen Feedbacks an

OPUS Augsburg

WiFi-Based Human Activity Recognition Using Attention-Based BiLSTM

Author: Elkelany Amany
McKeever Susan
Ross Robert
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/02/2023
Field of study

Recently, significant efforts have been made to explore human activity recognition (HAR) techniques that use information gathered by existing indoor wireless infrastructures through WiFi signals without demanding the monitored subject to carry a dedicated device. The key intuition is that different activities introduce different multi-paths in WiFi signals and generate different patterns in the time series of channel state information (CSI). In this paper, we propose and evaluate a full pipeline for a CSI-based human activity recognition framework for 12 activities in three different spatial environments using two deep learning models: ABiLSTM and CNN-ABiLSTM. Evaluation experiments have demonstrated that the proposed models outperform state-of-the-art models. Also, the experiments show that the proposed models can be applied to other environments with different configurations, albeit with some caveats. The proposed ABiLSTM model achieves an overall accuracy of 94.03%, 91.96%, and 92.59% across the 3 target environments. While the proposed CNN-ABiLSTM model reaches an accuracy of 98.54%, 94.25% and 95.09% across those same environments

Arrow@TUDublin

Recommended from our members

Multimodal Multisensor attention modelling

Author: Taheri MH
Publication venue
Publication date: 01/05/2020
Field of study

Introduction: Sustaining attention is one of the most important factors in determining successful outcomes and deep learning in students. Existing approaches to track student engagement involve periodic human observations that are subject to inter-rater reliability. Our solution uses real-time Multimodal Multisensor data labeled by objective performance outcomes to track the attention of students. Method: The study involves four students with a combined diagnosis of cerebral palsy and a learning disability who took part in a 3-month trial over 59 sessions. Multimodal Multisensor data were collected while they participated in a Continuous Performance Test (CPT). Eyegaze, electroencephalogram, body pose, and interaction data were used to create a model of student attention through objective labeling from the Continuous Performance Test outcomes. To achieve this, a type of continuous performance test is introduced, the Seek-X type. Nine features were extracted including High-Level handpicked Compound Features (HLCF). Using leave-one-out cross-validation, a series of different machine learning approaches were evaluated. Research questions: RQ1: Can we create a model of attention for PMLD/CP students using the CPT? RQ2: What are the main correlations found in the CPT outcomes and the Multimodal Multisensor data? Results: Overall, the random forest classification approach achieved the best classification results. Using random forest, 84.8% classification for attention and 65.4% accuracy for inattention were achieved. We compared these results to outcomes from different models: AdaBoost, decision tree, k-Nearest Neighbor, naïve Bayes, neural network, and support vector machine. We showed that using a multisensor approach achieved higher accuracy than using features from any reduced set of sensors. Incorporating person-specific data improved the classification outcome, compared to being participant neutral. We found that using HighLevel handpicked Compound Features (HLCF) can improve the classification accuracy in every sensor mode. Our approach is robust to both sensor fallout and occlusions. The single most important sensor feature to the classification of attention and inattention was shown to be eye-gaze. We have shown that we can accurately predict the level of attention of students with learning disabilities in a real-time approach that is not subject to inter-rater reliability, human observation, or reliant on a single mode of sensor input. In total, 2475 separate correlation tests were carried over 55 data points using Pearson’s correlation coefficient. Data points from the SDT, CPT outcomes measures, Multimodal Multisensor features, and participant characteristics were assessed longitudinally for cross-correlation significance. A strong positive correlation was found between participant ability to maintain sustained and selective attention in the CPT to their academic progress in school (d′), P < .01. Participants who showed more inhibition in tests had progressed further in their academic assessments P < .01. The Seek-X type CPT also showed specific physiological characteristics, including body movement range and eye-gaze that were significant in P scales such as ‘Reading’ and ‘Listening’ P < .05. We found that participant bias was overall liberal B″D < 0. Participants iii showed no significant bias change during the sessions, and we found no significant correlation between bias (B″D) and sensitivity (d′). Conclusion: An approach to labeling Multimodal Multisensor data to train machine-learning algorithms to track the attention of students with profound and multiple disabilities has been presented. We posit that this approach can overcome the variation in observer inter-rater reliability when using standardized scales in tracking the emotional expression of students with such profound disabilities. The accuracy of our approach increases with multiple modes of sensor input, and our method is robust to sensor occlusion and fall-out. Multiple sources of sensor input are provided, to accommodate a wide variety of users and their needs. Our model can reliably track the attention of students with profound disabilities, regardless of the sensors available. A system incorporating this model can help teachers design personalized interventions for a very heterogeneous group of students, where teachers cannot possibly attend to each of their individual needs. This approach could be used to identify those with the greatest learning challenges, to guarantee that all students are supported to reach their full potential. Keywords—Affective computing in education, affect detection, attention, continuous performance test, engagement, flow, HCI, interaction, learning disabilities, machine learning, multimodal, multisensor, physiological sensors, Signal Detection Theory, selective attention, sustained attention, student engagement

Nottingham Trent Institutional Repository (IRep)

Robust Modeling of Epistemic Mental States and Their Applications in Assistive Technology

Author: Rahman A K M Mahbubur
Publication venue: University of Memphis Digital Commons
Publication date: 03/12/2013
Field of study

This dissertation presents the design and implementation of EmoAssist: Emotion-Enabled Assistive Tool to Enhance Dyadic Conversation for the Blind . The key functionalities of the system are to recognize behavioral expressions and to predict 3-D affective dimensions from visual cues and to provide audio feedback to the visually impaired in a natural environment. Prior to describing the EmoAssist, this dissertation identifies and advances research challenges in the analysis of the facial features and their temporal dynamics with Epistemic Mental States in dyadic conversation. A number of statistical analyses and simulations were performed to get the answer of important research questions about the complex interplay between facial features and mental states. It was found that the non-linear relations are mostly prevalent rather than the linear ones. Further, the portable prototype of assistive technology that can aid blind individual to understand his/her interlocutor\u27s mental states has been designed based on the analysis. A number of challenges related to the system, communication protocols, error-free tracking of face and robust modeling of behavioral expressions /affective dimensions were addressed to make the EmoAssist effective in a real world scenario. In addition, orientation-sensor information from the phone was used to correct image alignment to improve the robustness in real life deployment. It was observed that the EmoAssist can predict affective dimensions with acceptable accuracy (Maximum Correlation-Coefficient for valence: 0.76, arousal: 0.78, and dominance: 0.76) in natural conversation. The overall minimum and maximum response-times are (64.61 milliseconds) and (128.22 milliseconds), respectively. The integration of sensor information for correcting the orientation has helped in significant improvement (16% in average) of accuracy in recognizing behavioral expressions. A user study with ten blind people shows that the EmoAssist is highly acceptable to them (Average acceptability rating using Likert: 6.0 where 1 and 7 are the lowest and highest possible ratings, respectively) in social interaction

University of Memphis Digital Commons

Towards uncertainty-aware and label-efficient machine learning of human expressive behaviour

Author: Tellamekala Mani Kumar
Publication venue
Publication date: 14/12/2022
Field of study

The ability to recognise emotional expressions from non-verbal behaviour plays a key role in human-human interaction. Endowing machines with the same ability is critical to enriching human-computer interaction. Despite receiving widespread attention so far, human-level automatic recognition of affective expressions is still an elusive task for machines. Towards improving the current state of machine learning methods applied to affect recognition, this thesis identifies two challenges: label ambiguity and label scarcity. Firstly, this thesis notes that it is difficult to establish a clear one-to-one mapping between inputs (face images or speech segments) and their target emotion labels, considering that emotion perception is inherently subjective. As a result, the problem of label ambiguity naturally arises in the manual annotations of affect. Ignoring this fundamental problem, most existing affect recognition methods implicitly assume a one-to-one input-target mapping and use deterministic function learning. In contrast, this thesis proposes to learn non-deterministic functions based on uncertainty-aware probabilistic models, as they can naturally accommodate the one-to-many input-target mapping. Besides improving the affect recognition performance, the proposed uncertainty-aware models in this thesis demonstrate three important applications: adaptive multimodal affect fusion, human-in-the-loop learning of affect, and improved performance on downstream behavioural analysis tasks like personality traits estimation. Secondly, this thesis aims to address the challenge of scarcity of affect labelled datasets, caused by the cumbersome and time-consuming nature of the affect annotation process. To this end, this thesis notes that audio and visual feature encoders used in the existing models are label-inefficient i.e. learning them requires large amounts of labelled training data. As a solution, this thesis proposes to pre-train the feature encoders using unlabelled data to make them more label-efficient i.e. using as few labelled training examples as possible to achieve good emotion recognition performance. A novel self-supervised pre-training method is proposed in this thesis by posing hand-engineered emotion features as task-specific representation learning priors. By leveraging large amounts of unlabelled audiovisual data, the proposed self-supervised pre-training method demonstrates much better label efficiency compared to the commonly employed pre-training methods

Nottingham eTheses