210 research outputs found
Multi-label and multimodal classifier for affectve states recognition in virtual rehabilitation
Computational systems that process multiple affective states may benefit from explicitly considering the interaction between
the states to enhance their recognition performance. This work proposes the combination of a multi-label classifier, Circular Classifier
Chain (CCC), with a multimodal classifier, Fusion using a Semi-Naive Bayesian classifier (FSNBC), to include explicitly the
dependencies between multiple affective states during the automatic recognition process. This combination of classifiers is applied to a
virtual rehabilitation context of post-stroke patients. We collected data from post-stroke patients, which include finger pressure, hand
movements, and facial expressions during ten longitudinal sessions. Videos of the sessions were labelled by clinicians to recognize
four states: tiredness, anxiety, pain, and engagement. Each state was modelled by the FSNBC receiving the information of finger
pressure, hand movements, and facial expressions. The four FSNBCs were linked in the CCC to exploit the dependency relationships
between the states. The convergence of CCC was reached by 5 iterations at most for all the patients. Results (ROC AUC) of CCC with
the FSNBC are over 0.940 ± 0.045 (mean ± std. deviation) for the four states. Relationships of mutual exclusion between engagement
and all the other states and co-occurrences between pain and anxiety were detected and discussed
A Novel Multimodal Approach for Studying the Dynamics of Curiosity in Small Group Learning
Curiosity is a vital metacognitive skill in educational contexts, leading to
creativity, and a love of learning. And while many school systems increasingly
undercut curiosity by teaching to the test, teachers are increasingly
interested in how to evoke curiosity in their students to prepare them for a
world in which lifelong learning and reskilling will be more and more
important. One aspect of curiosity that has received little attention, however,
is the role of peers in eliciting curiosity. We present what we believe to be
the first theoretical framework that articulates an integrated socio-cognitive
account of curiosity that ties observable behaviors in peers to underlying
curiosity states. We make a bipartite distinction between individual and
interpersonal functions that contribute to curiosity, and multimodal behaviors
that fulfill these functions. We validate the proposed framework by leveraging
a longitudinal latent variable modeling approach. Findings confirm a positive
predictive relationship between the latent variables of individual and
interpersonal functions and curiosity, with the interpersonal functions
exercising a comparatively stronger influence. Prominent behavioral
realizations of these functions are also discovered in a data-driven manner. We
instantiate the proposed theoretical framework in a set of strategies and
tactics that can be incorporated into learning technologies to indicate, evoke,
and scaffold curiosity. This work is a step towards designing learning
technologies that can recognize and evoke moment-by-moment curiosity during
learning in social contexts and towards a more complete multimodal learning
analytics. The underlying rationale is applicable more generally for developing
computer support for other metacognitive and socio-emotional skills.Comment: arXiv admin note: text overlap with arXiv:1704.0748
Real-time generation and adaptation of social companion robot behaviors
Social robots will be part of our future homes.
They will assist us in everyday tasks, entertain us, and provide helpful advice.
However, the technology still faces challenges that must be overcome to equip the machine with social competencies and make it a socially intelligent and accepted housemate.
An essential skill of every social robot is verbal and non-verbal communication.
In contrast to voice assistants, smartphones, and smart home technology, which are already part of many people's lives today, social robots have an embodiment that raises expectations towards the machine.
Their anthropomorphic or zoomorphic appearance suggests they can communicate naturally with speech, gestures, or facial expressions and understand corresponding human behaviors.
In addition, robots also need to consider individual users' preferences: everybody is shaped by their culture, social norms, and life experiences, resulting in different expectations towards communication with a robot.
However, robots do not have human intuition - they must be equipped with the corresponding algorithmic solutions to these problems.
This thesis investigates the use of reinforcement learning to adapt the robot's verbal and non-verbal communication to the user's needs and preferences.
Such non-functional adaptation of the robot's behaviors primarily aims to improve the user experience and the robot's perceived social intelligence.
The literature has not yet provided a holistic view of the overall challenge: real-time adaptation requires control over the robot's multimodal behavior generation, an understanding of human feedback, and an algorithmic basis for machine learning.
Thus, this thesis develops a conceptual framework for designing real-time non-functional social robot behavior adaptation with reinforcement learning.
It provides a higher-level view from the system designer's perspective and guidance from the start to the end.
It illustrates the process of modeling, simulating, and evaluating such adaptation processes.
Specifically, it guides the integration of human feedback and social signals to equip the machine with social awareness.
The conceptual framework is put into practice for several use cases, resulting in technical proofs of concept and research prototypes.
They are evaluated in the lab and in in-situ studies.
These approaches address typical activities in domestic environments, focussing on the robot's expression of personality, persona, politeness, and humor.
Within this scope, the robot adapts its spoken utterances, prosody, and animations based on human explicit or implicit feedback.Soziale Roboter werden Teil unseres zukĂŒnftigen Zuhauses sein.
Sie werden uns bei alltĂ€glichen Aufgaben unterstĂŒtzen, uns unterhalten und uns mit hilfreichen RatschlĂ€gen versorgen.
Noch gibt es allerdings technische Herausforderungen, die zunĂ€chst ĂŒberwunden werden mĂŒssen, um die Maschine mit sozialen Kompetenzen auszustatten und zu einem sozial intelligenten und akzeptierten Mitbewohner zu machen.
Eine wesentliche FĂ€higkeit eines jeden sozialen Roboters ist die verbale und nonverbale Kommunikation.
Im Gegensatz zu Sprachassistenten, Smartphones und Smart-Home-Technologien, die bereits heute Teil des Lebens vieler Menschen sind, haben soziale Roboter eine Verkörperung, die Erwartungen an die Maschine weckt.
Ihr anthropomorphes oder zoomorphes Aussehen legt nahe, dass sie in der Lage sind, auf natĂŒrliche Weise mit Sprache, Gestik oder Mimik zu kommunizieren, aber auch entsprechende menschliche Kommunikation zu verstehen.
DarĂŒber hinaus mĂŒssen Roboter auch die individuellen Vorlieben der Benutzer berĂŒcksichtigen.
So ist jeder Mensch von seiner Kultur, sozialen Normen und eigenen Lebenserfahrungen geprĂ€gt, was zu unterschiedlichen Erwartungen an die Kommunikation mit einem Roboter fĂŒhrt.
Roboter haben jedoch keine menschliche Intuition - sie mĂŒssen mit entsprechenden Algorithmen fĂŒr diese Probleme ausgestattet werden.
In dieser Arbeit wird der Einsatz von bestĂ€rkendem Lernen untersucht, um die verbale und nonverbale Kommunikation des Roboters an die BedĂŒrfnisse und Vorlieben des Benutzers anzupassen.
Eine solche nicht-funktionale Anpassung des Roboterverhaltens zielt in erster Linie darauf ab, das Benutzererlebnis und die wahrgenommene soziale Intelligenz des Roboters zu verbessern.
Die Literatur bietet bisher keine ganzheitliche Sicht auf diese Herausforderung: Echtzeitanpassung erfordert die Kontrolle ĂŒber die multimodale Verhaltenserzeugung des Roboters, ein VerstĂ€ndnis des menschlichen Feedbacks und eine algorithmische Basis fĂŒr maschinelles Lernen.
Daher wird in dieser Arbeit ein konzeptioneller Rahmen fĂŒr die Gestaltung von nicht-funktionaler Anpassung der Kommunikation sozialer Roboter mit bestĂ€rkendem Lernen entwickelt.
Er bietet eine ĂŒbergeordnete Sichtweise aus der Perspektive des Systemdesigners und eine Anleitung vom Anfang bis zum Ende.
Er veranschaulicht den Prozess der Modellierung, Simulation und Evaluierung solcher Anpassungsprozesse.
Insbesondere wird auf die Integration von menschlichem Feedback und sozialen Signalen eingegangen, um die Maschine mit sozialem Bewusstsein auszustatten.
Der konzeptionelle Rahmen wird fĂŒr mehrere AnwendungsfĂ€lle in die Praxis umgesetzt, was zu technischen Konzeptnachweisen und Forschungsprototypen fĂŒhrt, die in Labor- und In-situ-Studien evaluiert werden.
Diese AnsÀtze befassen sich mit typischen AktivitÀten in hÀuslichen Umgebungen, wobei der Schwerpunkt auf dem Ausdruck der Persönlichkeit, dem Persona, der Höflichkeit und dem Humor des Roboters liegt.
In diesem Rahmen passt der Roboter seine Sprache, Prosodie, und Animationen auf Basis expliziten oder impliziten menschlichen Feedbacks an
WiFi-Based Human Activity Recognition Using Attention-Based BiLSTM
Recently, significant efforts have been made to explore human activity recognition (HAR) techniques that use information gathered by existing indoor wireless infrastructures through WiFi signals without demanding the monitored subject to carry a dedicated device. The key intuition is that different activities introduce different multi-paths in WiFi signals and generate different patterns in the time series of channel state information (CSI). In this paper, we propose and evaluate a full pipeline for a CSI-based human activity recognition framework for 12 activities in three different spatial environments using two deep learning models: ABiLSTM and CNN-ABiLSTM. Evaluation experiments have demonstrated that the proposed models outperform state-of-the-art models. Also, the experiments show that the proposed models can be applied to other environments with different configurations, albeit with some caveats. The proposed ABiLSTM model achieves an overall accuracy of 94.03%, 91.96%, and 92.59% across the 3 target environments. While the proposed CNN-ABiLSTM model reaches an accuracy of 98.54%, 94.25% and 95.09% across those same environments
Recommended from our members
Multimodal Multisensor attention modelling
Introduction: Sustaining attention is one of the most important factors in determining successful outcomes and deep learning in students. Existing approaches to track student engagement involve periodic human observations that are subject to inter-rater reliability. Our solution uses real-time Multimodal Multisensor data labeled by objective performance outcomes to track the attention of students.
Method: The study involves four students with a combined diagnosis of cerebral palsy and a learning disability who took part in a 3-month trial over 59 sessions. Multimodal Multisensor data were collected while they participated in a Continuous Performance Test (CPT). Eyegaze, electroencephalogram, body pose, and interaction data were used to create a model of student attention through objective labeling from the Continuous Performance Test outcomes. To achieve this, a type of continuous performance test is introduced, the Seek-X type. Nine features were extracted including High-Level handpicked Compound Features (HLCF). Using leave-one-out cross-validation, a series of different machine learning approaches were evaluated.
Research questions:
RQ1: Can we create a model of attention for PMLD/CP students using the CPT?
RQ2: What are the main correlations found in the CPT outcomes and the Multimodal Multisensor data?
Results: Overall, the random forest classification approach achieved the best classification results. Using random forest, 84.8% classification for attention and 65.4% accuracy for inattention were achieved. We compared these results to outcomes from different models: AdaBoost, decision tree, k-Nearest Neighbor, naĂŻve Bayes, neural network, and support vector machine. We showed that using a multisensor approach achieved higher accuracy than using features from any reduced set of sensors. Incorporating person-specific data improved the classification outcome, compared to being participant neutral. We found that using HighLevel handpicked Compound Features (HLCF) can improve the classification accuracy in every sensor mode. Our approach is robust to both sensor fallout and occlusions. The single most important sensor feature to the classification of attention and inattention was shown to be eye-gaze. We have shown that we can accurately predict the level of attention of students with learning disabilities in a real-time approach that is not subject to inter-rater reliability, human observation, or reliant on a single mode of sensor input. In total, 2475 separate correlation tests were carried over 55 data points using Pearsonâs correlation coefficient. Data points from the SDT, CPT outcomes measures, Multimodal Multisensor features, and participant characteristics were assessed longitudinally for cross-correlation significance. A strong positive correlation was found between participant ability to maintain sustained and selective attention in the CPT to their academic progress in school (dâČ), P < .01. Participants who showed more inhibition in tests had progressed further in their academic assessments P < .01. The Seek-X type CPT also showed specific physiological characteristics, including body movement range and eye-gaze that were significant in P scales such as âReadingâ and âListeningâ P < .05. We found that participant bias was overall liberal BâłD < 0. Participants iii showed no significant bias change during the sessions, and we found no significant correlation between bias (BâłD) and sensitivity (dâČ).
Conclusion: An approach to labeling Multimodal Multisensor data to train machine-learning algorithms to track the attention of students with profound and multiple disabilities has been presented. We posit that this approach can overcome the variation in observer inter-rater reliability when using standardized scales in tracking the emotional expression of students with such profound disabilities. The accuracy of our approach increases with multiple modes of sensor input, and our method is robust to sensor occlusion and fall-out. Multiple sources of sensor input are provided, to accommodate a wide variety of users and their needs. Our model can reliably track the attention of students with profound disabilities, regardless of the sensors available. A system incorporating this model can help teachers design personalized interventions for a very heterogeneous group of students, where teachers cannot possibly attend to each of their individual needs. This approach could be used to identify those with the greatest learning challenges, to guarantee that all students are supported to reach their full potential.
KeywordsâAffective computing in education, affect detection, attention, continuous performance test, engagement, flow, HCI, interaction, learning disabilities, machine learning, multimodal, multisensor, physiological sensors, Signal Detection Theory, selective attention, sustained attention, student engagement
Robust Modeling of Epistemic Mental States and Their Applications in Assistive Technology
This dissertation presents the design and implementation of EmoAssist: Emotion-Enabled Assistive Tool to Enhance Dyadic Conversation for the Blind . The key functionalities of the system are to recognize behavioral expressions and to predict 3-D affective dimensions from visual cues and to provide audio feedback to the visually impaired in a natural environment. Prior to describing the EmoAssist, this dissertation identifies and advances research challenges in the analysis of the facial features and their temporal dynamics with Epistemic Mental States in dyadic conversation. A number of statistical analyses and simulations were performed to get the answer of important research questions about the complex interplay between facial features and mental states. It was found that the non-linear relations are mostly prevalent rather than the linear ones. Further, the portable prototype of assistive technology that can aid blind individual to understand his/her interlocutor\u27s mental states has been designed based on the analysis. A number of challenges related to the system, communication protocols, error-free tracking of face and robust modeling of behavioral expressions /affective dimensions were addressed to make the EmoAssist effective in a real world scenario. In addition, orientation-sensor information from the phone was used to correct image alignment to improve the robustness in real life deployment. It was observed that the EmoAssist can predict affective dimensions with acceptable accuracy (Maximum Correlation-Coefficient for valence: 0.76, arousal: 0.78, and dominance: 0.76) in natural conversation. The overall minimum and maximum response-times are (64.61 milliseconds) and (128.22 milliseconds), respectively. The integration of sensor information for correcting the orientation has helped in significant improvement (16% in average) of accuracy in recognizing behavioral expressions. A user study with ten blind people shows that the EmoAssist is highly acceptable to them (Average acceptability rating using Likert: 6.0 where 1 and 7 are the lowest and highest possible ratings, respectively) in social interaction
Towards uncertainty-aware and label-efficient machine learning of human expressive behaviour
The ability to recognise emotional expressions from non-verbal behaviour plays a key role in human-human interaction. Endowing machines with the same ability is critical to enriching human-computer interaction. Despite receiving widespread attention so far, human-level automatic recognition of affective expressions is still an elusive task for machines. Towards improving the current state of machine learning methods applied to affect recognition, this thesis identifies two challenges: label ambiguity and label scarcity.
Firstly, this thesis notes that it is difficult to establish a clear one-to-one mapping between inputs (face images or speech segments) and their target emotion labels, considering that emotion perception is inherently subjective. As a result, the problem of label ambiguity naturally arises in the manual annotations of affect. Ignoring this fundamental problem, most existing affect recognition methods implicitly assume a one-to-one input-target mapping and use deterministic function learning. In contrast, this thesis proposes to learn non-deterministic functions based on uncertainty-aware probabilistic models, as they can naturally accommodate the one-to-many input-target mapping. Besides improving the affect recognition performance, the proposed uncertainty-aware models in this thesis demonstrate three important applications: adaptive multimodal affect fusion, human-in-the-loop learning of affect, and improved performance on downstream behavioural analysis tasks like personality traits estimation.
Secondly, this thesis aims to address the challenge of scarcity of affect labelled datasets, caused by the cumbersome and time-consuming nature of the affect annotation process. To this end, this thesis notes that audio and visual feature encoders used in the existing models are label-inefficient i.e. learning them requires large amounts of labelled training data. As a solution, this thesis proposes to pre-train the feature encoders using unlabelled data to make them more label-efficient i.e. using as few labelled training examples as possible to achieve good emotion recognition performance. A novel self-supervised pre-training method is proposed in this thesis by posing hand-engineered emotion features as task-specific representation learning priors. By leveraging large amounts of unlabelled audiovisual data, the proposed self-supervised pre-training method demonstrates much better label efficiency compared to the commonly employed pre-training methods
- âŠ