To aid existing telemental health services, we propose DeepTMH, a novel
framework that models telemental health session videos by extracting latent
vectors corresponding to Affective and Cognitive features frequently used in
psychology literature. Our approach leverages advances in semi-supervised
learning to tackle the data scarcity in the telemental health session video
domain and consists of a multimodal semi-supervised GAN to detect important
mental health indicators during telemental health sessions. We demonstrate the
usefulness of our framework and contrast against existing works in two tasks:
Engagement regression and Valence-Arousal regression, both of which are
important to psychologists during a telemental health session. Our framework
reports 40% improvement in RMSE over SOTA method in Engagement Regression and
50% improvement in RMSE over SOTA method in Valence-Arousal Regression. To
tackle the scarcity of publicly available datasets in telemental health space,
we release a new dataset, MEDICA, for mental health patient engagement
detection. Our dataset, MEDICA consists of 1299 videos, each 3 seconds long. To
the best of our knowledge, our approach is the first method to model telemental
health session data based on psychology-driven Affective and Cognitive
features, which also accounts for data sparsity by leveraging a semi-supervised
setup