1,618 research outputs found

    Intelligent System for Depression Scale Estimation with Facial Expressions and Case Study in Industrial Intelligence

    Get PDF
    As a mental disorder, depression has affected people's lives, works, and so on. Researchers have proposed various industrial intelligent systems in the pattern recognition field for audiovisual depression detection. This paper presents an end‐to‐end trainable intelligent system to generate high‐level representations over the entire video clip. Specifically, a three‐dimensional (3D) convolutional neural network equipped with a module spatiotemporal feature aggregation module (STFAM) is trained from scratch on audio/visual emotion challenge (AVEC)2013 and AVEC2014 data, which can model the discriminative patterns closely related to depression. In the STFAM, channel and spatial attention mechanism and an aggregation method, namely 3D DEP‐NetVLAD, are integrated to learn the compact characteristic based on the feature maps. Extensive experiments on the two databases (i.e., AVEC2013 and AVEC2014) are illustrated that the proposed intelligent system can efficiently model the underlying depression patterns and obtain better performances over the most video‐based depression recognition approaches. Case studies are presented to describes the applicability of the proposed intelligent system for industrial intelligence.Peer reviewe

    Automatic Detection of Self-Adaptors for Psychological Distress

    Get PDF
    Psychological distress is a significant and growing issue in society. Automatic detection, assessment, and analysis of such distress is an active area of research. Compared to modalities such as face, head, and vocal, research investigating the use of the body modality for these tasks is relatively sparse. This is, in part, due to the lack of available datasets and difficulty in automatically extracting useful body features. Recent advances in pose estimation and deep learning have enabled new approaches to this modality and domain. We propose a novel method to automatically detect self-adaptors and fidgeting, a subset of self-adaptors that has been shown to be correlated with psychological distress. We also propose a multi-modal approach that combines different feature representations using Multi-modal Deep Denoising Auto-Encoders and Improved Fisher Vector encoding. We also demonstrate that our proposed model, combining audio-visual features with automatically detected fidgeting behavioral cues, can successfully predict distress levels in a dataset labeled with self-reported anxiety and depression levels. To enable this research we introduce a new dataset containing full body videos for short interviews and self-reported distress labels.King's College, Cmabridg

    Spectral Representation of Behaviour Primitives for Depression Analysis

    Get PDF

    Looking at the Body: Automatic Analysis of Body Gestures and Self-Adaptors in Psychological Distress

    Get PDF
    Psychological distress is a significant and growing issue in society. Automatic detection, assessment, and analysis of such distress is an active area of research. Compared to modalities such as face, head, and vocal, research investigating the use of the body modality for these tasks is relatively sparse. This is, in part, due to the limited available datasets and difficulty in automatically extracting useful body features. Recent advances in pose estimation and deep learning have enabled new approaches to this modality and domain. To enable this research, we have collected and analyzed a new dataset containing full body videos for short interviews and self-reported distress labels. We propose a novel method to automatically detect self-adaptors and fidgeting, a subset of self-adaptors that has been shown to be correlated with psychological distress. We perform analysis on statistical body gestures and fidgeting features to explore how distress levels affect participants' behaviors. We then propose a multi-modal approach that combines different feature representations using Multi-modal Deep Denoising Auto-Encoders and Improved Fisher Vector Encoding. We demonstrate that our proposed model, combining audio-visual features with automatically detected fidgeting behavioral cues, can successfully predict distress levels in a dataset labeled with self-reported anxiety and depression levels

    Automatic Recognition of Emotions and Membership in Group Videos

    Get PDF
    Automatic affect analysis and understanding has become a well established research area in the last two decades. However, little attention has been paid to the analysis of the affect expressed in group settings, either in the form of affect expressed by the whole group collectively or affect expressed by each individual member of the group. This paper presents a framework which, in group settings automatically classifies the affect expressed by each individual group member along both arousal and valence dimensions. We first introduce a novel Volume Quantised Local Zernike Moments Fisher Vectors (vQLZM-FV) descriptor to represent the facial behaviours of individuals in the spatio-temporal domain and then propose a method to recognize the group membership of each individual (i.e., which group the individual in question is part of) by using their face and body behavioural cues. We conduct a set of experiments on a newly collected dataset that contains fourteen recordings of four groups, each consisting of four people watching affective movie stimuli. Our experimental results show that (1) the proposed vQLZM-FV outperforms the other feature representations in affect recognition, and (2) group membership can be recognized using the non-verbal face and body features, indicating that individuals influence each other's behaviours within a group setting

    A facial depression recognition method based on hybrid multi-head cross attention network

    Get PDF
    IntroductionDeep-learn methods based on convolutional neural networks (CNNs) have demonstrated impressive performance in depression analysis. Nevertheless, some critical challenges need to be resolved in these methods: (1) It is still difficult for CNNs to learn long-range inductive biases in the low-level feature extraction of different facial regions because of the spatial locality. (2) It is difficult for a model with only a single attention head to concentrate on various parts of the face simultaneously, leading to less sensitivity to other important facial regions associated with depression. In the case of facial depression recognition, many of the clues come from a few areas of the face simultaneously, e.g., the mouth and eyes.MethodsTo address these issues, we present an end-to-end integrated framework called Hybrid Multi-head Cross Attention Network (HMHN), which includes two stages. The first stage consists of the Grid-Wise Attention block (GWA) and Deep Feature Fusion block (DFF) for the low-level visual depression feature learning. In the second stage, we obtain the global representation by encoding high-order interactions among local features with Multi-head Cross Attention block (MAB) and Attention Fusion block (AFB).ResultsWe experimented on AVEC2013 and AVEC2014 depression datasets. The results of AVEC 2013 (RMSE = 7.38, MAE = 6.05) and AVEC 2014 (RMSE = 7.60, MAE = 6.01) demonstrated the efficacy of our method and outperformed most of the state-of-the-art video-based depression recognition approaches.DiscussionWe proposed a deep learning hybrid model for depression recognition by capturing the higher-order interactions between the depression features of multiple facial regions, which can effectively reduce the error in depression recognition and gives great potential for clinical experiments
    corecore