21 research outputs found
A Method for Analysis of Patient Speech in Dialogue for Dementia Detection
We present an approach to automatic detection of Alzheimer's type dementia
based on characteristics of spontaneous spoken language dialogue consisting of
interviews recorded in natural settings. The proposed method employs additive
logistic regression (a machine learning boosting method) on content-free
features extracted from dialogical interaction to build a predictive model. The
model training data consisted of 21 dialogues between patients with Alzheimer's
and interviewers, and 17 dialogues between patients with other health
conditions and interviewers. Features analysed included speech rate,
turn-taking patterns and other speech parameters. Despite relying solely on
content-free features, our method obtains overall accuracy of 86.5\%, a result
comparable to those of state-of-the-art methods that employ more complex
lexical, syntactic and semantic features. While further investigation is
needed, the fact that we were able to obtain promising results using only
features that can be easily extracted from spontaneous dialogues suggests the
possibility of designing non-invasive and low-cost mental health monitoring
tools for use at scale.Comment: 8 pages, Resources and ProcessIng of linguistic, paralinguistic and
extra-linguistic Data from people with various forms of cognitive impairment,
LREC 201
Hierarchical attention interpretation: an interpretable speech-level transformer for bi-modal depression detection
Depression is a common mental disorder. Automatic depression detection tools
using speech, enabled by machine learning, help early screening of depression.
This paper addresses two limitations that may hinder the clinical
implementations of such tools: noise resulting from segment-level labelling and
a lack of model interpretability. We propose a bi-modal speech-level
transformer to avoid segment-level labelling and introduce a hierarchical
interpretation approach to provide both speech-level and sentence-level
interpretations, based on gradient-weighted attention maps derived from all
attention layers to track interactions between input features. We show that the
proposed model outperforms a model that learns at a segment level (=0.854,
=0.947, =0.897 compared to =0.732, =0.808, =0.768). For model
interpretation, using one true positive sample, we show which sentences within
a given speech are most relevant to depression detection; and which text tokens
and Mel-spectrogram regions within these sentences are most relevant to
depression detection. These interpretations allow clinicians to verify the
validity of predictions made by depression detection tools, promoting their
clinical implementations.Comment: 5 pages, 3 figures, submitted to IEEE International Conference on
Acoustics, Speech, and Signal Processin