132 research outputs found
Pain Analysis using Adaptive Hierarchical Spatiotemporal Dynamic Imaging
Automatic pain intensity estimation plays a pivotal role in healthcare and
medical fields. While many methods have been developed to gauge human pain
using behavioral or physiological indicators, facial expressions have emerged
as a prominent tool for this purpose. Nevertheless, the dependence on labeled
data for these techniques often renders them expensive and time-consuming. To
tackle this, we introduce the Adaptive Hierarchical Spatio-temporal Dynamic
Image (AHDI) technique. AHDI encodes spatiotemporal changes in facial videos
into a singular RGB image, permitting the application of simpler 2D deep models
for video representation. Within this framework, we employ a residual network
to derive generalized facial representations. These representations are
optimized for two tasks: estimating pain intensity and differentiating between
genuine and simulated pain expressions. For the former, a regression model is
trained using the extracted representations, while for the latter, a binary
classifier identifies genuine versus feigned pain displays. Testing our method
on two widely-used pain datasets, we observed encouraging results for both
tasks. On the UNBC database, we achieved an MSE of 0.27 outperforming the SOTA
which had an MSE of 0.40. On the BioVid dataset, our model achieved an accuracy
of 89.76%, which is an improvement of 5.37% over the SOTA accuracy. Most
notably, for distinguishing genuine from simulated pain, our accuracy stands at
94.03%, marking a substantial improvement of 8.98%. Our methodology not only
minimizes the need for extensive labeled data but also augments the precision
of pain evaluations, facilitating superior pain management
Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention
Automatic emotion recognition (ER) has recently gained lot of interest due to
its potential in many real-world applications. In this context, multimodal
approaches have been shown to improve performance (over unimodal approaches) by
combining diverse and complementary sources of information, providing some
robustness to noisy and missing modalities. In this paper, we focus on
dimensional ER based on the fusion of facial and vocal modalities extracted
from videos, where complementary audio-visual (A-V) relationships are explored
to predict an individual's emotional states in valence-arousal space. Most
state-of-the-art fusion techniques rely on recurrent networks or conventional
attention mechanisms that do not effectively leverage the complementary nature
of A-V modalities. To address this problem, we introduce a joint
cross-attentional model for A-V fusion that extracts the salient features
across A-V modalities, that allows to effectively leverage the inter-modal
relationships, while retaining the intra-modal relationships. In particular, it
computes the cross-attention weights based on correlation between the joint
feature representation and that of the individual modalities. By deploying the
joint A-V feature representation into the cross-attention module, it helps to
simultaneously leverage both the intra and inter modal relationships, thereby
significantly improving the performance of the system over the vanilla
cross-attention module. The effectiveness of our proposed approach is validated
experimentally on challenging videos from the RECOLA and AffWild2 datasets.
Results indicate that our joint cross-attentional A-V fusion model provides a
cost-effective solution that can outperform state-of-the-art approaches, even
when the modalities are noisy or absent.Comment: arXiv admin note: substantial text overlap with arXiv:2203.14779,
arXiv:2111.0522
Going Deeper than Tracking: A Survey of Computer-Vision Based Recognition of Animal Pain and Emotions
Advances in animal motion tracking and pose recognition have been a game changer in the study of animal behavior. Recently, an increasing number of works go 'deeper' than tracking, and address automated recognition of animals' internal states such as emotions and pain with the aim of improving animal welfare, making this a timely moment for a systematization of the field. This paper provides a comprehensive survey of computer vision-based research on recognition of pain and emotional states in animals, addressing both facial and bodily behavior analysis. We summarize the efforts that have been presented so far within this topic-classifying them across different dimensions, highlight challenges and research gaps, and provide best practice recommendations for advancing the field, and some future directions for research
Facial affect "in the wild": a survey and a new database
Well-established databases and benchmarks have been developed in the past 20 years for automatic facial behaviour analysis. Nevertheless, for some important problems regarding analysis of facial behaviour, such as (a) estimation of affect in a continuous dimensional space (e.g., valence and arousal) in videos displaying spontaneous facial behaviour and (b) detection of the activated facial muscles (i.e., facial action unit detection), to the best of our knowledge, well-established in-the-wild databases and benchmarks do not exist. That is, the majority of the publicly available corpora for the above tasks contain samples that have been captured in controlled recording conditions and/or captured under a very specific milieu. Arguably, in order to make further progress in automatic understanding of facial behaviour, datasets that have been captured in in the-wild and in various milieus have to be developed. In this paper, we survey the progress that has been recently made on understanding facial behaviour in-the-wild, the datasets that have been developed so far and the methodologies that have been developed, paying particular attention to deep learning techniques for the task. Finally, we make a significant step further and propose a new comprehensive benchmark for training methodologies, as well as assessing the performance of facial affect/behaviour analysis/ understanding in-the-wild. To the best of our knowledge, this is the first time that such a benchmark for valence and arousal "in-the-wild" is presente
ANN Models for Shoulder Pain Detection based on Human Facial Expression Covered by Mask
Facial expressions are a method to communicate if someone feels pain. Moreover, coding facial movements to assess pain requires extensive training and is time-consuming for clinical practice. In addition, in Covid 19 pandemic, it was difficult to determine this expression due to the mask on the face. There for, it needs to develop a system that can detect the pain from facial expressions when a person is wearing a mask. There are 41 points used to form 19 geometrical features. It used 20.000 frames of 24 respondents from the dataset as secondary data . From these data, training, and testing were carried out using the ANN (Artificial Neural Network) method with a variation of the number of neurons in the hidden layer, i.e., 5, 10, 15, and 20 neurons. The results obtained from testing these data are the highest accuracy of 86% with the number of 20 hidden layers
- …