4 research outputs found
What is the ground truth? Reliability of multi-annotator data for audio tagging
Crowdsourcing has become a common approach for annotating large amounts of
data. It has the advantage of harnessing a large workforce to produce large
amounts of data in a short time, but comes with the disadvantage of employing
non-expert annotators with different backgrounds. This raises the problem of
data reliability, in addition to the general question of how to combine the
opinions of multiple annotators in order to estimate the ground truth. This
paper presents a study of the annotations and annotators' reliability for audio
tagging. We adapt the use of Krippendorf's alpha and multi-annotator competence
estimation (MACE) for a multi-labeled data scenario, and present how MACE can
be used to estimate a candidate ground truth based on annotations from
non-expert users with different levels of expertise and competence.Comment: submitted to EUSIPCO 202
Unsupervised Contrastive Learning of Sound Event Representations
Self-supervised representation learning can mitigate the limitations in
recognition tasks with few manually labeled data but abundant unlabeled
data---a common scenario in sound event research. In this work, we explore
unsupervised contrastive learning as a way to learn sound event
representations. To this end, we propose to use the pretext task of contrasting
differently augmented views of sound events. The views are computed primarily
via mixing of training examples with unrelated backgrounds, followed by other
data augmentations. We analyze the main components of our method via ablation
experiments. We evaluate the learned representations using linear evaluation,
and in two in-domain downstream sound event classification tasks, namely, using
limited manually labeled data, and using noisy labeled data. Our results
suggest that unsupervised contrastive pre-training can mitigate the impact of
data scarcity and increase robustness against noisy labels, outperforming
supervised baselines.Comment: A 4-page version is submitted to ICASSP 202