1 research outputs found
Aalto's End-to-End DNN systems for the INTERSPEECH 2020 Computational Paralinguistics Challenge
End-to-end neural network models (E2E) have shown significant performance
benefits on different INTERSPEECH ComParE tasks. Prior work has applied either
a single instance of an E2E model for a task or the same E2E architecture for
different tasks. However, applying a single model is unstable or using the same
architecture under-utilizes task-specific information. On ComParE 2020 tasks,
we investigate applying an ensemble of E2E models for robust performance and
developing task-specific modifications for each task. ComParE 2020 introduces
three sub-challenges: the breathing sub-challenge to predict the output of a
respiratory belt worn by a patient while speaking, the elderly sub-challenge to
estimate the elderly speaker's arousal and valence levels and the mask
sub-challenge to classify if the speaker is wearing a mask or not. On each of
these tasks, an ensemble outperforms the single E2E model. On the breathing
sub-challenge, we study the impact of multi-loss strategies on task
performance. On the elderly sub-challenge, predicting the valence and arousal
levels prompts us to investigate multi-task training and implement data
sampling strategies to handle class imbalance. On the mask sub-challenge, using
an E2E system without feature engineering is competitive to feature-engineered
baselines and provides substantial gains when combined with feature-engineered
baselines