2 research outputs found
Gender Representation in French Broadcast Corpora and Its Impact on ASR Performance
This paper analyzes the gender representation in four major corpora of French
broadcast. These corpora being widely used within the speech processing
community, they are a primary material for training automatic speech
recognition (ASR) systems. As gender bias has been highlighted in numerous
natural language processing (NLP) applications, we study the impact of the
gender imbalance in TV and radio broadcast on the performance of an ASR system.
This analysis shows that women are under-represented in our data in terms of
speakers and speech turns. We introduce the notion of speaker role to refine
our analysis and find that women are even fewer within the Anchor category
corresponding to prominent speakers. The disparity of available data for both
gender causes performance to decrease on women. However this global trend can
be counterbalanced for speaker who are used to speak in the media when
sufficient amount of data is available.Comment: Accepted to ACM Workshop AI4T