19 research outputs found
Beyond the Labels: Unveiling Text-Dependency in Paralinguistic Speech Recognition Datasets
Paralinguistic traits like cognitive load and emotion are increasingly
recognized as pivotal areas in speech recognition research, often examined
through specialized datasets like CLSE and IEMOCAP. However, the integrity of
these datasets is seldom scrutinized for text-dependency. This paper critically
evaluates the prevalent assumption that machine learning models trained on such
datasets genuinely learn to identify paralinguistic traits, rather than merely
capturing lexical features. By examining the lexical overlap in these datasets
and testing the performance of machine learning models, we expose significant
text-dependency in trait-labeling. Our results suggest that some machine
learning models, especially large pre-trained models like HuBERT, might
inadvertently focus on lexical characteristics rather than the intended
paralinguistic features. The study serves as a call to action for the research
community to reevaluate the reliability of existing datasets and methodologies,
ensuring that machine learning models genuinely learn what they are designed to
recognize