2 research outputs found
A multi-modal approach for identifying schizophrenia using cross-modal attention
This study focuses on how different modalities of human communication can be
used to distinguish between healthy controls and subjects with schizophrenia
who exhibit strong positive symptoms. We developed a multi-modal schizophrenia
classification system using audio, video, and text. Facial action units and
vocal tract variables were extracted as low-level features from video and audio
respectively, which were then used to compute high-level coordination features
that served as the inputs to the audio and video modalities.
Context-independent text embeddings extracted from transcriptions of speech
were used as the input for the text modality. The multi-modal system is
developed by fusing a segment-to-session-level classifier for video and audio
modalities with a text model based on a Hierarchical Attention Network (HAN)
with cross-modal attention. The proposed multi-modal system outperforms the
previous state-of-the-art multi-modal system by 8.53% in the weighted average
F1 score.Comment: Accepted to Annual International Conference of the IEEE Engineering
in Medicine and Biology Society 202
