2,583 research outputs found
The Verbal and Non Verbal Signals of Depression -- Combining Acoustics, Text and Visuals for Estimating Depression Level
Depression is a serious medical condition that is suffered by a large number
of people around the world. It significantly affects the way one feels, causing
a persistent lowering of mood. In this paper, we propose a novel
attention-based deep neural network which facilitates the fusion of various
modalities. We use this network to regress the depression level. Acoustic, text
and visual modalities have been used to train our proposed network. Various
experiments have been carried out on the benchmark dataset, namely, Distress
Analysis Interview Corpus - a Wizard of Oz (DAIC-WOZ). From the results, we
empirically justify that the fusion of all three modalities helps in giving the
most accurate estimation of depression level. Our proposed approach outperforms
the state-of-the-art by 7.17% on root mean squared error (RMSE) and 8.08% on
mean absolute error (MAE).Comment: 10 pages including references, 2 figure
Looking at the Body: Automatic Analysis of Body Gestures and Self-Adaptors in Psychological Distress
Psychological distress is a significant and growing issue in society.
Automatic detection, assessment, and analysis of such distress is an active
area of research. Compared to modalities such as face, head, and vocal,
research investigating the use of the body modality for these tasks is
relatively sparse. This is, in part, due to the limited available datasets and
difficulty in automatically extracting useful body features. Recent advances in
pose estimation and deep learning have enabled new approaches to this modality
and domain. To enable this research, we have collected and analyzed a new
dataset containing full body videos for short interviews and self-reported
distress labels. We propose a novel method to automatically detect
self-adaptors and fidgeting, a subset of self-adaptors that has been shown to
be correlated with psychological distress. We perform analysis on statistical
body gestures and fidgeting features to explore how distress levels affect
participants' behaviors. We then propose a multi-modal approach that combines
different feature representations using Multi-modal Deep Denoising
Auto-Encoders and Improved Fisher Vector Encoding. We demonstrate that our
proposed model, combining audio-visual features with automatically detected
fidgeting behavioral cues, can successfully predict distress levels in a
dataset labeled with self-reported anxiety and depression levels
Learning to Recognize Actions from Limited Training Examples Using a Recurrent Spiking Neural Model
A fundamental challenge in machine learning today is to build a model that
can learn from few examples. Here, we describe a reservoir based spiking neural
model for learning to recognize actions with a limited number of labeled
videos. First, we propose a novel encoding, inspired by how microsaccades
influence visual perception, to extract spike information from raw video data
while preserving the temporal correlation across different frames. Using this
encoding, we show that the reservoir generalizes its rich dynamical activity
toward signature action/movements enabling it to learn from few training
examples. We evaluate our approach on the UCF-101 dataset. Our experiments
demonstrate that our proposed reservoir achieves 81.3%/87% Top-1/Top-5
accuracy, respectively, on the 101-class data while requiring just 8 video
examples per class for training. Our results establish a new benchmark for
action recognition from limited video examples for spiking neural models while
yielding competetive accuracy with respect to state-of-the-art non-spiking
neural models.Comment: 13 figures (includes supplementary information
- …