1,618 research outputs found
Automatic detection of major depressive disorder via a bag-of-behaviour-words approach
A paper in the Third International Symposium on Image Computing and Digital Medicine (ISICDM 2019
Intelligent System for Depression Scale Estimation with Facial Expressions and Case Study in Industrial Intelligence
As a mental disorder, depression has affected people's lives, works, and so on. Researchers have proposed various industrial intelligent systems in the pattern recognition field for audiovisual depression detection. This paper presents an end‐to‐end trainable intelligent system to generate high‐level representations over the entire video clip. Specifically, a three‐dimensional (3D) convolutional neural network equipped with a module spatiotemporal feature aggregation module (STFAM) is trained from scratch on audio/visual emotion challenge (AVEC)2013 and AVEC2014 data, which can model the discriminative patterns closely related to depression. In the STFAM, channel and spatial attention mechanism and an aggregation method, namely 3D DEP‐NetVLAD, are integrated to learn the compact characteristic based on the feature maps. Extensive experiments on the two databases (i.e., AVEC2013 and AVEC2014) are illustrated that the proposed intelligent system can efficiently model the underlying depression patterns and obtain better performances over the most video‐based depression recognition approaches. Case studies are presented to describes the applicability of the proposed intelligent system for industrial intelligence.Peer reviewe
Automatic Detection of Self-Adaptors for Psychological Distress
Psychological distress is a significant and growing
issue in society. Automatic detection, assessment, and analysis
of such distress is an active area of research. Compared to
modalities such as face, head, and vocal, research investigating
the use of the body modality for these tasks is relatively
sparse. This is, in part, due to the lack of available datasets
and difficulty in automatically extracting useful body features.
Recent advances in pose estimation and deep learning have
enabled new approaches to this modality and domain. We
propose a novel method to automatically detect self-adaptors
and fidgeting, a subset of self-adaptors that has been shown
to be correlated with psychological distress. We also propose
a multi-modal approach that combines different feature representations using Multi-modal Deep Denoising Auto-Encoders
and Improved Fisher Vector encoding. We also demonstrate
that our proposed model, combining audio-visual features with
automatically detected fidgeting behavioral cues, can successfully predict distress levels in a dataset labeled with self-reported anxiety and depression levels. To enable this research
we introduce a new dataset containing full body videos for short
interviews and self-reported distress labels.King's College, Cmabridg
Recommended from our members
A deep generic to specific recognition model for group membership analysis using non-verbal cues
Automatic understanding and analysis of groups has attracted increasing attention
in the vision and multimedia communities in recent years. However,
little attention has been paid to the automatic analysis of the non-verbal behaviors
and how this can be utilized for analysis of group membership, i.e.,
recognizing which group each individual is part of. This paper presents a
novel Support Vector Machine (SVM) based Deep Specific Recognition Model
(DeepSRM) that is learned based on a generic recognition model. The generic
recognition model refers to the model trained with data across different conditions,
i.e., when people are watching movies of different types. Although the
generic recognition model can provide a baseline for the recognition model
trained for each specific condition, the different behaviors people exhibit in
different conditions limit the recognition performance of the generic model.
Therefore, the specific recognition model is proposed for each condition separately
and built on the top of the generic recognition model. We conduct a set
of experiments using a database collected to study group analysis while each
group (i.e., four participants together) were watching a number of long movie
segments. The proposed deep specific recognition model (44%) outperforms the generic recognition model (26%). The recognition of group membership also indicates that the non-verbal behaviors of individuals within a group share commonalities
Looking at the Body: Automatic Analysis of Body Gestures and Self-Adaptors in Psychological Distress
Psychological distress is a significant and growing issue in society.
Automatic detection, assessment, and analysis of such distress is an active
area of research. Compared to modalities such as face, head, and vocal,
research investigating the use of the body modality for these tasks is
relatively sparse. This is, in part, due to the limited available datasets and
difficulty in automatically extracting useful body features. Recent advances in
pose estimation and deep learning have enabled new approaches to this modality
and domain. To enable this research, we have collected and analyzed a new
dataset containing full body videos for short interviews and self-reported
distress labels. We propose a novel method to automatically detect
self-adaptors and fidgeting, a subset of self-adaptors that has been shown to
be correlated with psychological distress. We perform analysis on statistical
body gestures and fidgeting features to explore how distress levels affect
participants' behaviors. We then propose a multi-modal approach that combines
different feature representations using Multi-modal Deep Denoising
Auto-Encoders and Improved Fisher Vector Encoding. We demonstrate that our
proposed model, combining audio-visual features with automatically detected
fidgeting behavioral cues, can successfully predict distress levels in a
dataset labeled with self-reported anxiety and depression levels
Automatic Recognition of Emotions and Membership in Group Videos
Automatic affect analysis and understanding has become a well established research area in the last two decades. However, little attention has been paid to the analysis of the affect expressed in group settings, either in the form of affect expressed by the whole group collectively or affect expressed by each individual member of the group. This paper presents a framework which, in group settings automatically classifies the affect expressed by each individual group member along both arousal and valence dimensions. We first introduce a novel Volume Quantised Local Zernike Moments Fisher Vectors (vQLZM-FV) descriptor to represent the facial behaviours of individuals in the spatio-temporal domain and then propose a method to recognize the group membership of each individual (i.e., which group the individual in question is part of) by using their face and body behavioural cues. We conduct a set of experiments on a newly collected dataset that contains fourteen recordings of four groups, each consisting of four people watching affective movie stimuli. Our experimental results show that (1) the proposed vQLZM-FV outperforms the other feature representations in affect recognition, and (2) group membership can be recognized using the non-verbal face and body features, indicating that individuals influence each other's behaviours within a group setting
A facial depression recognition method based on hybrid multi-head cross attention network
IntroductionDeep-learn methods based on convolutional neural networks (CNNs) have demonstrated impressive performance in depression analysis. Nevertheless, some critical challenges need to be resolved in these methods: (1) It is still difficult for CNNs to learn long-range inductive biases in the low-level feature extraction of different facial regions because of the spatial locality. (2) It is difficult for a model with only a single attention head to concentrate on various parts of the face simultaneously, leading to less sensitivity to other important facial regions associated with depression. In the case of facial depression recognition, many of the clues come from a few areas of the face simultaneously, e.g., the mouth and eyes.MethodsTo address these issues, we present an end-to-end integrated framework called Hybrid Multi-head Cross Attention Network (HMHN), which includes two stages. The first stage consists of the Grid-Wise Attention block (GWA) and Deep Feature Fusion block (DFF) for the low-level visual depression feature learning. In the second stage, we obtain the global representation by encoding high-order interactions among local features with Multi-head Cross Attention block (MAB) and Attention Fusion block (AFB).ResultsWe experimented on AVEC2013 and AVEC2014 depression datasets. The results of AVEC 2013 (RMSE = 7.38, MAE = 6.05) and AVEC 2014 (RMSE = 7.60, MAE = 6.01) demonstrated the efficacy of our method and outperformed most of the state-of-the-art video-based depression recognition approaches.DiscussionWe proposed a deep learning hybrid model for depression recognition by capturing the higher-order interactions between the depression features of multiple facial regions, which can effectively reduce the error in depression recognition and gives great potential for clinical experiments
- …