Search CORE

2 research outputs found

Gender Representation in French Broadcast Corpora and Its Impact on ASR Performance

Author: Besacier Laurent
Garnerin Mahault
Rossato Solange
Publication venue
Publication date: 23/08/2019
Field of study

This paper analyzes the gender representation in four major corpora of French broadcast. These corpora being widely used within the speech processing community, they are a primary material for training automatic speech recognition (ASR) systems. As gender bias has been highlighted in numerous natural language processing (NLP) applications, we study the impact of the gender imbalance in TV and radio broadcast on the performance of an ASR system. This analysis shows that women are under-represented in our data in terms of speakers and speech turns. We introduce the notion of speaker role to refine our analysis and find that women are even fewer within the Anchor category corresponding to prominent speakers. The disparity of available data for both gender causes performance to decrease on women. However this global trend can be counterbalanced for speaker who are used to speak in the media when sufficient amount of data is available.Comment: Accepted to ACM Workshop AI4T

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Performance over Random

Author: Apostolidis Evlampios
Chasanis Vasileios
Demir Mahmut
Edward J. Y.
Gong Boqing
Liu Yen-Ting
Mahmoud Karim M.
Mayu Otani Esa Rahtu
Wei Huawei
Yuan Li
Zhao Bin
Zhou Kaiyang
Zhou Kaiyang
Publication venue: ACM
Publication date: 12/10/2020
Field of study

This paper proposes a new evaluation approach for video summarization algorithms. We start by studying the currently established evaluation protocol; this protocol, defined over the ground-truth annotations of the SumMe and TVSum datasets, quantifies the agreement between the user-defined and the automatically-created summaries with F-Score, and reports the average performance on a few different training/testing splits of the used dataset. We evaluate five publicly-available summarization algorithms under a large-scale experimental setting with 50 randomly-created data splits. We show that the results reported in the papers are not always congruent with their performance on the large-scale experiment, and that the F-Score cannot be used for comparing algorithms evaluated on different splits. We also show that the above shortcomings of the established evaluation protocol are due to the significantly varying levels of difficulty among the utilized splits, that affect the outcomes of the evaluations. Further analysis of these findings indicates a noticeable performance correlation among all algorithms and a random summarizer. To mitigate these shortcomings we propose an evaluation protocol that makes estimates about the difficulty of each used data split and utilizes this information during the evaluation process. Experiments involving different evaluation settings demonstrate the increased representativeness of performance results when using the proposed evaluation approach, and the increased reliability of comparisons when the examined methods have been evaluated on different data splits

Crossref

ZENODO

Queen Mary Research Online