1,418 research outputs found
FAF: A novel multimodal emotion recognition approach integrating face, body and text
Multimodal emotion analysis performed better in emotion recognition depending
on more comprehensive emotional clues and multimodal emotion dataset. In this
paper, we developed a large multimodal emotion dataset, named "HED" dataset, to
facilitate the emotion recognition task, and accordingly propose a multimodal
emotion recognition method. To promote recognition accuracy, "Feature After
Feature" framework was used to explore crucial emotional information from the
aligned face, body and text samples. We employ various benchmarks to evaluate
the "HED" dataset and compare the performance with our method. The results show
that the five classification accuracy of the proposed multimodal fusion method
is about 83.75%, and the performance is improved by 1.83%, 9.38%, and 21.62%
respectively compared with that of individual modalities. The complementarity
between each channel is effectively used to improve the performance of emotion
recognition. We had also established a multimodal online emotion prediction
platform, aiming to provide free emotion prediction to more users
Multimodal emotion recognition
Reading emotions from facial expression and speech is a milestone in Human-Computer
Interaction. Recent sensing technologies, namely the Microsoft Kinect Sensor, provide
basic input modalities data, such as RGB imaging, depth imaging and speech, that can
be used in Emotion Recognition. Moreover Kinect can track a face in real time and
present the face fiducial points, as well as 6 basic Action Units (AUs).
In this work we explore this information by gathering a new and exclusive
dataset. This is a new opportunity for the academic community as well to the progress
of the emotion recognition problem. The database includes RGB, depth, audio, fiducial
points and AUs for 18 volunteers for 7 emotions. We then present automatic emotion
classification results on this dataset by employing k-Nearest Neighbor, Support Vector
Machines and Neural Networks classifiers, with unimodal and multimodal approaches.
Our conclusions show that multimodal approaches can attain better results.Ler e reconhecer emoções de expressões faciais e verbais é um marco na Interacção
Humana com um Computador. As recentes tecnologias de deteção, nomeadamente o
sensor Microsoft Kinect, recolhem dados de modalidades básicas como imagens RGB,
de informaçãode profundidade e defala que podem ser usados em reconhecimento de
emoções. Mais ainda, o sensor Kinect consegue reconhecer e seguir uma cara em tempo
real e apresentar os pontos fiduciais, assim como as 6 AUs – Action Units básicas.
Neste trabalho exploramos esta informação através da compilação de um dataset único e
exclusivo que representa uma oportunidade para a comunidade académica e para o
progresso do problema do reconhecimento de emoções. Este dataset inclui dados RGB,
de profundidade, de fala, pontos fiduciais e AUs, para 18 voluntários e 7 emoções.
Apresentamos resultados com a classificação automática de emoções com este dataset,
usando classificadores k-vizinhos próximos, máquinas de suporte de vetoreseredes
neuronais, em abordagens multimodais e unimodais. As nossas conclusões indicam que
abordagens multimodais permitem obter melhores resultados
Predicting Player Engagement in Tom Clancy's The Division 2: A Multimodal Approach via Pixels and Gamepad Actions
This paper introduces a large scale multimodal corpus collected for the
purpose of analysing and predicting player engagement in commercial-standard
games. The corpus is solicited from 25 players of the action role-playing game
Tom Clancy's The Division 2, who annotated their level of engagement using a
time-continuous annotation tool. The cleaned and processed corpus presented in
this paper consists of nearly 20 hours of annotated gameplay videos accompanied
by logged gamepad actions. We report preliminary results on predicting
long-term player engagement based on in-game footage and game controller
actions using Convolutional Neural Network architectures. Results obtained
suggest we can predict the player engagement with up to 72% accuracy on average
(88% at best) when we fuse information from the game footage and the player's
controller input. Our findings validate the hypothesis that long-term (i.e. 1
hour of play) engagement can be predicted efficiently solely from pixels and
gamepad actions.Comment: 8 pages, accepted for publication and presentation at 2023 25th ACM
International Conference on Multimodal Interaction (ICMI
D2.4 - Final Bundle of Client-side Components
This document describes the final bundle of client-side components, including descriptions of their functionality, and links to their full designs and downloadable versions. This bundle aggregates only the WP2 assets. Other client-side assets not covered here will be addressed in the final WP3 deliverables. Those assets created and licenced as open software will be continuously improved and maintained by their creators until the end of the project (the task has been extended to month 48) and beyond. For a full description of the related server-side components, please refer to D2.2 - Final Bundle of Server-side Components.This study is part of the RAGE project. The RAGE project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 644187. This publication reflects only the author's view. The European Commission is not responsible for any use that may be made of the information it contains
An enhanced stress indices in signal processing based on advanced mmatthew correlation coefficient (MCCA) and multimodal function using EEG signal
Stress is a response to various environmental, psychological, and social factors, resulting in strain and pressure on individuals. Categorizing stress levels is a common practise, often using low, medium, and high stress categories. However, the limitation of only three stress levels is a significant drawback of the existing approach. This study aims to address this limitation and proposes an improved method for EEG feature extraction and stress level categorization. The main contribution of this work lies in the enhanced stress level categorization, which expands from three to six levels using the newly established fractional scale based on the quantities' scale influenced by MCCA and multimodal equation performance. The concept of standard deviation (STD) helps in categorizing stress levels by dividing the scale of quantities, leading to an improvement in the process. The lack of performance in the Matthew Correlation Coefficient (MCC) equation is observed in relation to accuracy values. Also, multimodal is rarely discussed in terms of parameters. Therefore, the MCCA and multimodal function provide the advantage of significantly enhancing accuracy as a part of the study's contribution. This study introduces the concept of an Advanced Matthew Correlation Coefficient (MCCA) and applies the six-sigma framework to enhance accuracy in stress level categorization. The research focuses on expanding the stress levels from three to six, utilizing a new scale of fractional stress levels influenced by MCCA and multimodal equation performance. Furthermore, the study applies signal pre-processing techniques to filter and segregate the EEG signal into Delta, Theta, Alpha, and Beta frequency bands. Subsequently, feature extraction is conducted, resulting in twenty-one statistical and non-statistical features. These features are employed in both the MCCA and multimodal function analysis. The study employs the Support Vector Machine (SVM), Random Forest (RF), and k-Nearest Neighbour (k-NN) classifiers for stress level validation. After conducting experiments and performance evaluations, RF demonstrates the highest average accuracy of 85%–10% in 10-Fold and K-Fold techniques, outperforming SVM and k-NN. In conclusion, this study presents an improved approach to stress level categorization and EEG feature extraction. The proposed Advanced Matthew Correlation Coefficient (MCCA) and six-sigma framework contribute to achieving higher accuracy, surpassing the limitations of the existing three-level categorization. The results indicate the superiority of the Random Forest classifier over SVM and k-NN. This research has implications for various applications and fields, providing a more effective equation to accurately categorize stress levels with a potential accuracy exceeding 95%
The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset: Collection, Insights and Improvements
Truly real-life data presents a strong, but exciting challenge for sentiment
and emotion research. The high variety of possible `in-the-wild' properties
makes large datasets such as these indispensable with respect to building
robust machine learning models. A sufficient quantity of data covering a deep
variety in the challenges of each modality to force the exploratory analysis of
the interplay of all modalities has not yet been made available in this
context. In this contribution, we present MuSe-CaR, a first of its kind
multimodal dataset. The data is publicly available as it recently served as the
testing bed for the 1st Multimodal Sentiment Analysis Challenge, and focused on
the tasks of emotion, emotion-target engagement, and trustworthiness
recognition by means of comprehensively integrating the audio-visual and
language modalities. Furthermore, we give a thorough overview of the dataset in
terms of collection and annotation, including annotation tiers not used in this
year's MuSe 2020. In addition, for one of the sub-challenges - predicting the
level of trustworthiness - no participant outperformed the baseline model, and
so we propose a simple, but highly efficient Multi-Head-Attention network that
exceeds using multimodal fusion the baseline by around 0.2 CCC (almost 50 %
improvement).Comment: accepted versio
Enhancing biofeedback-driven self-guided virtual reality exposure therapy through arousal detection from multimodal data using machine learning
Virtual reality exposure therapy (VRET) is a novel intervention technique that allows individuals to experience anxiety-evoking stimuli in a safe environment, recognise specific triggers and gradually increase their exposure to perceived threats. Public-speaking anxiety (PSA) is a prevalent form of social anxiety, characterised by stressful arousal and anxiety generated when presenting to an audience. In self-guided VRET, participants can gradually increase their tolerance to exposure and reduce anxiety-induced arousal and PSA over time. However, creating such a VR environment and determining physiological indices of anxiety-induced arousal or distress is an open challenge. Environment modelling, character creation and animation, psychological state determination and the use of machine learning (ML) models for anxiety or stress detection are equally important, and multi-disciplinary expertise is required. In this work, we have explored a series of ML models with publicly available data sets (using electroencephalogram and heart rate variability) to predict arousal states. If we can detect anxiety-induced arousal, we can trigger calming activities to allow individuals to cope with and overcome distress. Here, we discuss the means of effective selection of ML models and parameters in arousal detection. We propose a pipeline to overcome the model selection problem with different parameter settings in the context of virtual reality exposure therapy. This pipeline can be extended to other domains of interest where arousal detection is crucial. Finally, we have implemented a biofeedback framework for VRET where we successfully provided feedback as a form of heart rate and brain laterality index from our acquired multimodal data for psychological intervention to overcome anxiety
High-Performance Modelling and Simulation for Big Data Applications
This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications
- …