Search CORE

4 research outputs found

What is the ground truth? Reliability of multi-annotator data for audio tagging

Author: Martin-Morato Irene
Mesaros Annamaria
Publication venue
Publication date: 01/01/2021
Field of study

Crowdsourcing has become a common approach for annotating large amounts of data. It has the advantage of harnessing a large workforce to produce large amounts of data in a short time, but comes with the disadvantage of employing non-expert annotators with different backgrounds. This raises the problem of data reliability, in addition to the general question of how to combine the opinions of multiple annotators in order to estimate the ground truth. This paper presents a study of the annotations and annotators' reliability for audio tagging. We adapt the use of Krippendorf's alpha and multi-annotator competence estimation (MACE) for a multi-labeled data scenario, and present how MACE can be used to estimate a candidate ground truth based on annotations from non-expert users with different levels of expertise and competence.Comment: submitted to EUSIPCO 202

arXiv.org e-Print Archive

Trepo - Institutional Repository of Tampere University

Unsupervised Contrastive Learning of Sound Event Representations

Author: Fonseca Eduardo
McGuinness Kevin
O'Connor Noel E.
Ortego Diego
Serra Xavier
Publication venue
Publication date: 15/11/2020
Field of study

Self-supervised representation learning can mitigate the limitations in recognition tasks with few manually labeled data but abundant unlabeled data---a common scenario in sound event research. In this work, we explore unsupervised contrastive learning as a way to learn sound event representations. To this end, we propose to use the pretext task of contrasting differently augmented views of sound events. The views are computed primarily via mixing of training examples with unrelated backgrounds, followed by other data augmentations. We analyze the main components of our method via ablation experiments. We evaluate the learned representations using linear evaluation, and in two in-domain downstream sound event classification tasks, namely, using limited manually labeled data, and using noisy labeled data. Our results suggest that unsupervised contrastive pre-training can mitigate the impact of data scarcity and increase robustness against noisy labels, outperforming supervised baselines.Comment: A 4-page version is submitted to ICASSP 202

arXiv.org e-Print Archive

DCU Online Research Access Service

UPF Digital Repository