Analyse des annotations faibles pour l'étiquetage d'événements sonores.

Abstract

Weak labels are a recurring problem in the context of ambient sound analysis. While multiple methods using neural networks have been proposed to address it, limited attention has been given to the analysis of the problem to have a better understanding of it. Many of these methods seem to improve detection or tagging performance, but they have been evaluated in scenarios where other problems such as unreliable labels, overlapping sound events, or class unbalance also occur. Therefore, it is difficult to conclude whether the observed improvement is due to solving the problem of weak labels or not. In this article, we provide for the first time a detailed analysis of the impact of weak labels independently of other problems on a sound event tagging system. We show that, in order to limit the negative impact of weak labels on the performance, the training clips must be at least as long as the test clips and longer training clip durations have a minor impact. We also show that good temporal aggregation can help to reduce this impact at test time and provide insight on the annotation granularity needed depending on the targeted scenario

    Similar works