Psychological distress is a significant and growing issue in society.
Automatic detection, assessment, and analysis of such distress is an active
area of research. Compared to modalities such as face, head, and vocal,
research investigating the use of the body modality for these tasks is
relatively sparse. This is, in part, due to the limited available datasets and
difficulty in automatically extracting useful body features. Recent advances in
pose estimation and deep learning have enabled new approaches to this modality
and domain. To enable this research, we have collected and analyzed a new
dataset containing full body videos for short interviews and self-reported
distress labels. We propose a novel method to automatically detect
self-adaptors and fidgeting, a subset of self-adaptors that has been shown to
be correlated with psychological distress. We perform analysis on statistical
body gestures and fidgeting features to explore how distress levels affect
participants' behaviors. We then propose a multi-modal approach that combines
different feature representations using Multi-modal Deep Denoising
Auto-Encoders and Improved Fisher Vector Encoding. We demonstrate that our
proposed model, combining audio-visual features with automatically detected
fidgeting behavioral cues, can successfully predict distress levels in a
dataset labeled with self-reported anxiety and depression levels