1,988 research outputs found
Computer audition for emotional wellbeing
This thesis is focused on the application of computer audition (i. e., machine listening) methodologies for monitoring states of emotional wellbeing. Computer audition is a growing field and has been successfully applied to an array of use cases in recent years. There are several advantages to audio-based computational analysis; for example, audio can be recorded non-invasively, stored economically, and can capture rich information on happenings in a given environment, e. g., human behaviour. With this in mind, maintaining emotional wellbeing is a challenge for humans and emotion-altering conditions, including stress and anxiety, have become increasingly common in recent years. Such conditions manifest in the body, inherently changing how we express ourselves. Research shows these alterations are perceivable within vocalisation, suggesting that speech-based audio monitoring may be valuable for developing artificially intelligent systems that target improved wellbeing. Furthermore, computer audition applies machine learning and other computational techniques to audio understanding, and so by combining computer audition with applications in the domain of computational paralinguistics and emotional wellbeing, this research concerns the broader field of empathy for Artificial Intelligence (AI). To this end, speech-based audio modelling that incorporates and understands paralinguistic wellbeing-related states may be a vital cornerstone for improving the degree of empathy that an artificial intelligence has.
To summarise, this thesis investigates the extent to which speech-based computer audition methodologies can be utilised to understand human emotional wellbeing. A fundamental background on the fields in question as they pertain to emotional wellbeing is first presented, followed by an outline of the applied audio-based methodologies. Next, detail is provided for several machine learning experiments focused on emotional wellbeing applications, including analysis and recognition of under-researched phenomena in speech, e. g., anxiety, and markers of stress. Core contributions from this thesis include the collection of several related datasets, hybrid fusion strategies for an emotional gold standard, novel machine learning strategies for data interpretation, and an in-depth acoustic-based computational evaluation of several human states. All of these contributions focus on ascertaining the advantage of audio in the context of modelling emotional wellbeing. Given the sensitive nature of human wellbeing, the ethical implications involved with developing and applying such systems are discussed throughout
Empathy Detection Using Machine Learning on Text, Audiovisual, Audio or Physiological Signals
Empathy is a social skill that indicates an individual's ability to
understand others. Over the past few years, empathy has drawn attention from
various disciplines, including but not limited to Affective Computing,
Cognitive Science and Psychology. Empathy is a context-dependent term; thus,
detecting or recognising empathy has potential applications in society,
healthcare and education. Despite being a broad and overlapping topic, the
avenue of empathy detection studies leveraging Machine Learning remains
underexplored from a holistic literature perspective. To this end, we
systematically collect and screen 801 papers from 10 well-known databases and
analyse the selected 54 papers. We group the papers based on input modalities
of empathy detection systems, i.e., text, audiovisual, audio and physiological
signals. We examine modality-specific pre-processing and network architecture
design protocols, popular dataset descriptions and availability details, and
evaluation protocols. We further discuss the potential applications, deployment
challenges and research gaps in the Affective Computing-based empathy domain,
which can facilitate new avenues of exploration. We believe that our work is a
stepping stone to developing a privacy-preserving and unbiased empathic system
inclusive of culture, diversity and multilingualism that can be deployed in
practice to enhance the overall well-being of human life
Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning
Emotion Recognition is attracting the attention of the research community due to the
multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper,
we propose a multimodal emotion recognition system that relies on speech and facial information.
For the speech-based modality, we evaluated several transfer-learning techniques, more specifically,
embedding extraction and Fine-Tuning. The best accuracy results were achieved when we fine-tuned
the CNN-14 of the PANNs framework, confirming that the training was more robust when it did not
start from scratch and the tasks were similar. Regarding the facial emotion recognizers, we propose a
framework that consists of a pre-trained Spatial Transformer Network on saliency maps and facial
images followed by a bi-LSTM with an attention mechanism. The error analysis reported that the
frame-based systems could present some problems when they were used directly to solve a videobased task despite the domain adaptation, which opens a new line of research to discover new ways
to correct this mismatch and take advantage of the embedded knowledge of these pre-trained models.
Finally, from the combination of these two modalities with a late fusion strategy, we achieved 80.08%
accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. The
results revealed that these modalities carry relevant information to detect users’ emotional state and
their combination enables improvement of system performance
Pain level and pain-related behaviour classification using GRU-based sparsely-connected RNNs
There is a growing body of studies on applying deep learning to biometrics
analysis. Certain circumstances, however, could impair the objective measures
and accuracy of the proposed biometric data analysis methods. For instance,
people with chronic pain (CP) unconsciously adapt specific body movements to
protect themselves from injury or additional pain. Because there is no
dedicated benchmark database to analyse this correlation, we considered one of
the specific circumstances that potentially influence a person's biometrics
during daily activities in this study and classified pain level and
pain-related behaviour in the EmoPain database. To achieve this, we proposed a
sparsely-connected recurrent neural networks (s-RNNs) ensemble with the gated
recurrent unit (GRU) that incorporates multiple autoencoders using a shared
training framework. This architecture is fed by multidimensional data collected
from inertial measurement unit (IMU) and surface electromyography (sEMG)
sensors. Furthermore, to compensate for variations in the temporal dimension
that may not be perfectly represented in the latent space of s-RNNs, we fused
hand-crafted features derived from information-theoretic approaches with
represented features in the shared hidden state. We conducted several
experiments which indicate that the proposed method outperforms the
state-of-the-art approaches in classifying both pain level and pain-related
behaviour
Pain Level and Pain-Related Behaviour Classification Using GRU-Based Sparsely-Connected RNNs
There is a growing body of studies on applying deep learning to biometrics analysis. Certain circumstances, however, could impair the objective measures and accuracy of the proposed biometric data analysis methods. For instance, people with chronic pain (CP) unconsciously adapt specific body movements to protect themselves from injury or additional pain. Because there is no dedicated benchmark database to analyse this correlation, we considered one of the specific circumstances that potentially influence a person's biometrics during daily activities in this study and classified pain level and pain-related behaviour in the EmoPain database. To achieve this, we proposed a sparsely-connected recurrent neural networks (s-RNNs) ensemble with the gated recurrent unit (GRU) that incorporates multiple autoencoders using a shared training framework. This architecture is fed by multidimensional data collected from inertial measurement unit (IMU) and surface electromyography (sEMG) sensors. Furthermore, to compensate for variations in the temporal dimension that may not be perfectly represented in the latent space of s-RNNs, we fused hand-crafted features derived from information-theoretic approaches with represented features in the shared hidden state. We conducted several experiments which indicate that the proposed method outperforms the state-of-the-art approaches in classifying both pain level and pain-related behaviour.This project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme, with grant agreement No. 101002711
A Proposal for Multimodal Emotion Recognition Using Aural Transformers and Action Units on RAVDESS Dataset
The work leading to these results was supported by the Spanish Ministry of Science and Innovation through the projects GOMINOLA (PID2020-118112RB-C21 and PID2020-118112RB-C22, funded by MCIN/AEI/10.13039/501100011033), CAVIAR (TEC2017-84593-C2-1-R, funded by MCIN/AEI/10.13039/501100011033/FEDER "Una manera de hacer Europa"), and AMIC-PoC (PDC2021-120846-C42, funded by MCIN/AEI/10.13039/501100011033 and by "the European Union "NextGenerationEU/PRTR"). This research also received funding from the European Union's Horizon2020 research and innovation program under grant agreement No 823907 (http://menhir-project.eu, accessed on 17 November 2021). Furthermore, R.K.'s research was supported by the Spanish Ministry of Education (FPI grant PRE2018-083225).Emotion recognition is attracting the attention of the research community due to its multiple
applications in different fields, such as medicine or autonomous driving. In this paper, we proposed
an automatic emotion recognizer system that consisted of a speech emotion recognizer (SER) and a
facial emotion recognizer (FER). For the SER, we evaluated a pre-trained xlsr-Wav2Vec2.0 transformer
using two transfer-learning techniques: embedding extraction and fine-tuning. The best accuracy
results were achieved when we fine-tuned the whole model by appending a multilayer perceptron
on top of it, confirming that the training was more robust when it did not start from scratch and the
previous knowledge of the network was similar to the task to adapt. Regarding the facial emotion
recognizer, we extracted the Action Units of the videos and compared the performance between
employing static models against sequential models. Results showed that sequential models beat
static models by a narrow difference. Error analysis reported that the visual systems could improve
with a detector of high-emotional load frames, which opened a new line of research to discover new
ways to learn from videos. Finally, combining these two modalities with a late fusion strategy, we
achieved 86.70% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying
eight emotions. Results demonstrated that these modalities carried relevant information to detect
users’ emotional state and their combination allowed to improve the final system performance.Spanish Government PID2020-118112RB-C21
PID2020-118112RB-C22
MCIN/AEI/10.13039/501100011033
TEC2017-84593-C2-1-R
MCIN/AEI/10.13039/501100011033/FEDER
PDC2021-120846-C42European Union "NextGenerationEU/PRTR")European Union's Horizon2020 research and innovation program 823907German Research Foundation (DFG) PRE2018-08322
- …