1,418 research outputs found

    FAF: A novel multimodal emotion recognition approach integrating face, body and text

    Full text link
    Multimodal emotion analysis performed better in emotion recognition depending on more comprehensive emotional clues and multimodal emotion dataset. In this paper, we developed a large multimodal emotion dataset, named "HED" dataset, to facilitate the emotion recognition task, and accordingly propose a multimodal emotion recognition method. To promote recognition accuracy, "Feature After Feature" framework was used to explore crucial emotional information from the aligned face, body and text samples. We employ various benchmarks to evaluate the "HED" dataset and compare the performance with our method. The results show that the five classification accuracy of the proposed multimodal fusion method is about 83.75%, and the performance is improved by 1.83%, 9.38%, and 21.62% respectively compared with that of individual modalities. The complementarity between each channel is effectively used to improve the performance of emotion recognition. We had also established a multimodal online emotion prediction platform, aiming to provide free emotion prediction to more users

    Multimodal emotion recognition

    Get PDF
    Reading emotions from facial expression and speech is a milestone in Human-Computer Interaction. Recent sensing technologies, namely the Microsoft Kinect Sensor, provide basic input modalities data, such as RGB imaging, depth imaging and speech, that can be used in Emotion Recognition. Moreover Kinect can track a face in real time and present the face fiducial points, as well as 6 basic Action Units (AUs). In this work we explore this information by gathering a new and exclusive dataset. This is a new opportunity for the academic community as well to the progress of the emotion recognition problem. The database includes RGB, depth, audio, fiducial points and AUs for 18 volunteers for 7 emotions. We then present automatic emotion classification results on this dataset by employing k-Nearest Neighbor, Support Vector Machines and Neural Networks classifiers, with unimodal and multimodal approaches. Our conclusions show that multimodal approaches can attain better results.Ler e reconhecer emoções de expressões faciais e verbais é um marco na Interacção Humana com um Computador. As recentes tecnologias de deteção, nomeadamente o sensor Microsoft Kinect, recolhem dados de modalidades básicas como imagens RGB, de informaçãode profundidade e defala que podem ser usados em reconhecimento de emoções. Mais ainda, o sensor Kinect consegue reconhecer e seguir uma cara em tempo real e apresentar os pontos fiduciais, assim como as 6 AUs – Action Units básicas. Neste trabalho exploramos esta informação através da compilação de um dataset único e exclusivo que representa uma oportunidade para a comunidade académica e para o progresso do problema do reconhecimento de emoções. Este dataset inclui dados RGB, de profundidade, de fala, pontos fiduciais e AUs, para 18 voluntários e 7 emoções. Apresentamos resultados com a classificação automática de emoções com este dataset, usando classificadores k-vizinhos próximos, máquinas de suporte de vetoreseredes neuronais, em abordagens multimodais e unimodais. As nossas conclusões indicam que abordagens multimodais permitem obter melhores resultados

    Predicting Player Engagement in Tom Clancy's The Division 2: A Multimodal Approach via Pixels and Gamepad Actions

    Full text link
    This paper introduces a large scale multimodal corpus collected for the purpose of analysing and predicting player engagement in commercial-standard games. The corpus is solicited from 25 players of the action role-playing game Tom Clancy's The Division 2, who annotated their level of engagement using a time-continuous annotation tool. The cleaned and processed corpus presented in this paper consists of nearly 20 hours of annotated gameplay videos accompanied by logged gamepad actions. We report preliminary results on predicting long-term player engagement based on in-game footage and game controller actions using Convolutional Neural Network architectures. Results obtained suggest we can predict the player engagement with up to 72% accuracy on average (88% at best) when we fuse information from the game footage and the player's controller input. Our findings validate the hypothesis that long-term (i.e. 1 hour of play) engagement can be predicted efficiently solely from pixels and gamepad actions.Comment: 8 pages, accepted for publication and presentation at 2023 25th ACM International Conference on Multimodal Interaction (ICMI

    D2.4 - Final Bundle of Client-side Components

    Get PDF
    This document describes the final bundle of client-side components, including descriptions of their functionality, and links to their full designs and downloadable versions. This bundle aggregates only the WP2 assets. Other client-side assets not covered here will be addressed in the final WP3 deliverables. Those assets created and licenced as open software will be continuously improved and maintained by their creators until the end of the project (the task has been extended to month 48) and beyond. For a full description of the related server-side components, please refer to D2.2 - Final Bundle of Server-side Components.This study is part of the RAGE project. The RAGE project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 644187. This publication reflects only the author's view. The European Commission is not responsible for any use that may be made of the information it contains

    An enhanced stress indices in signal processing based on advanced mmatthew correlation coefficient (MCCA) and multimodal function using EEG signal

    Get PDF
    Stress is a response to various environmental, psychological, and social factors, resulting in strain and pressure on individuals. Categorizing stress levels is a common practise, often using low, medium, and high stress categories. However, the limitation of only three stress levels is a significant drawback of the existing approach. This study aims to address this limitation and proposes an improved method for EEG feature extraction and stress level categorization. The main contribution of this work lies in the enhanced stress level categorization, which expands from three to six levels using the newly established fractional scale based on the quantities' scale influenced by MCCA and multimodal equation performance. The concept of standard deviation (STD) helps in categorizing stress levels by dividing the scale of quantities, leading to an improvement in the process. The lack of performance in the Matthew Correlation Coefficient (MCC) equation is observed in relation to accuracy values. Also, multimodal is rarely discussed in terms of parameters. Therefore, the MCCA and multimodal function provide the advantage of significantly enhancing accuracy as a part of the study's contribution. This study introduces the concept of an Advanced Matthew Correlation Coefficient (MCCA) and applies the six-sigma framework to enhance accuracy in stress level categorization. The research focuses on expanding the stress levels from three to six, utilizing a new scale of fractional stress levels influenced by MCCA and multimodal equation performance. Furthermore, the study applies signal pre-processing techniques to filter and segregate the EEG signal into Delta, Theta, Alpha, and Beta frequency bands. Subsequently, feature extraction is conducted, resulting in twenty-one statistical and non-statistical features. These features are employed in both the MCCA and multimodal function analysis. The study employs the Support Vector Machine (SVM), Random Forest (RF), and k-Nearest Neighbour (k-NN) classifiers for stress level validation. After conducting experiments and performance evaluations, RF demonstrates the highest average accuracy of 85%–10% in 10-Fold and K-Fold techniques, outperforming SVM and k-NN. In conclusion, this study presents an improved approach to stress level categorization and EEG feature extraction. The proposed Advanced Matthew Correlation Coefficient (MCCA) and six-sigma framework contribute to achieving higher accuracy, surpassing the limitations of the existing three-level categorization. The results indicate the superiority of the Random Forest classifier over SVM and k-NN. This research has implications for various applications and fields, providing a more effective equation to accurately categorize stress levels with a potential accuracy exceeding 95%

    The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset: Collection, Insights and Improvements

    Full text link
    Truly real-life data presents a strong, but exciting challenge for sentiment and emotion research. The high variety of possible `in-the-wild' properties makes large datasets such as these indispensable with respect to building robust machine learning models. A sufficient quantity of data covering a deep variety in the challenges of each modality to force the exploratory analysis of the interplay of all modalities has not yet been made available in this context. In this contribution, we present MuSe-CaR, a first of its kind multimodal dataset. The data is publicly available as it recently served as the testing bed for the 1st Multimodal Sentiment Analysis Challenge, and focused on the tasks of emotion, emotion-target engagement, and trustworthiness recognition by means of comprehensively integrating the audio-visual and language modalities. Furthermore, we give a thorough overview of the dataset in terms of collection and annotation, including annotation tiers not used in this year's MuSe 2020. In addition, for one of the sub-challenges - predicting the level of trustworthiness - no participant outperformed the baseline model, and so we propose a simple, but highly efficient Multi-Head-Attention network that exceeds using multimodal fusion the baseline by around 0.2 CCC (almost 50 % improvement).Comment: accepted versio

    Enhancing biofeedback-driven self-guided virtual reality exposure therapy through arousal detection from multimodal data using machine learning

    Get PDF
    Virtual reality exposure therapy (VRET) is a novel intervention technique that allows individuals to experience anxiety-evoking stimuli in a safe environment, recognise specific triggers and gradually increase their exposure to perceived threats. Public-speaking anxiety (PSA) is a prevalent form of social anxiety, characterised by stressful arousal and anxiety generated when presenting to an audience. In self-guided VRET, participants can gradually increase their tolerance to exposure and reduce anxiety-induced arousal and PSA over time. However, creating such a VR environment and determining physiological indices of anxiety-induced arousal or distress is an open challenge. Environment modelling, character creation and animation, psychological state determination and the use of machine learning (ML) models for anxiety or stress detection are equally important, and multi-disciplinary expertise is required. In this work, we have explored a series of ML models with publicly available data sets (using electroencephalogram and heart rate variability) to predict arousal states. If we can detect anxiety-induced arousal, we can trigger calming activities to allow individuals to cope with and overcome distress. Here, we discuss the means of effective selection of ML models and parameters in arousal detection. We propose a pipeline to overcome the model selection problem with different parameter settings in the context of virtual reality exposure therapy. This pipeline can be extended to other domains of interest where arousal detection is crucial. Finally, we have implemented a biofeedback framework for VRET where we successfully provided feedback as a form of heart rate and brain laterality index from our acquired multimodal data for psychological intervention to overcome anxiety

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications
    • …
    corecore