9,565 research outputs found

    Weakly-Supervised Temporal Localization via Occurrence Count Learning

    Get PDF
    We propose a novel model for temporal detection and localization which allows the training of deep neural networks using only counts of event occurrences as training labels. This powerful weakly-supervised framework alleviates the burden of the imprecise and time-consuming process of annotating event locations in temporal data. Unlike existing methods, in which localization is explicitly achieved by design, our model learns localization implicitly as a byproduct of learning to count instances. This unique feature is a direct consequence of the model's theoretical properties. We validate the effectiveness of our approach in a number of experiments (drum hit and piano onset detection in audio, digit detection in images) and demonstrate performance comparable to that of fully-supervised state-of-the-art methods, despite much weaker training requirements.Comment: Accepted at ICML 201

    Methodological considerations concerning manual annotation of musical audio in function of algorithm development

    Get PDF
    In research on musical audio-mining, annotated music databases are needed which allow the development of computational tools that extract from the musical audiostream the kind of high-level content that users can deal with in Music Information Retrieval (MIR) contexts. The notion of musical content, and therefore the notion of annotation, is ill-defined, however, both in the syntactic and semantic sense. As a consequence, annotation has been approached from a variety of perspectives (but mainly linguistic-symbolic oriented), and a general methodology is lacking. This paper is a step towards the definition of a general framework for manual annotation of musical audio in function of a computational approach to musical audio-mining that is based on algorithms that learn from annotated data. 1

    Vision-based Detection of Acoustic Timed Events: a Case Study on Clarinet Note Onsets

    Get PDF
    Acoustic events often have a visual counterpart. Knowledge of visual information can aid the understanding of complex auditory scenes, even when only a stereo mixdown is available in the audio domain, \eg identifying which musicians are playing in large musical ensembles. In this paper, we consider a vision-based approach to note onset detection. As a case study we focus on challenging, real-world clarinetist videos and carry out preliminary experiments on a 3D convolutional neural network based on multiple streams and purposely avoiding temporal pooling. We release an audiovisual dataset with 4.5 hours of clarinetist videos together with cleaned annotations which include about 36,000 onsets and the coordinates for a number of salient points and regions of interest. By performing several training trials on our dataset, we learned that the problem is challenging. We found that the CNN model is highly sensitive to the optimization algorithm and hyper-parameters, and that treating the problem as binary classification may prevent the joint optimization of precision and recall. To encourage further research, we publicly share our dataset, annotations and all models and detail which issues we came across during our preliminary experiments.Comment: Proceedings of the First International Conference on Deep Learning and Music, Anchorage, US, May, 2017 (arXiv:1706.08675v1 [cs.NE]

    Collecting ground truth annotations for drum detection in polyphonic music

    Get PDF
    In order to train and test algorithms that can automatically detect drum events in polyphonic music, ground truth data is needed. This paper describes a setup used for gathering manual annotations for 49 real-world music fragments containing different drum event types. Apart from the drum events, the beat was also annotated. The annotators were experienced drummers or percussionists. This paper is primarily aimed towards other drum detection researchers, but might also be of interest to others dealing with automatic music analysis, manual annotation and data gathering. Its purpose is threefold: providing annotation data for algorithm training and evaluation, describing a practical way of setting up a drum annotation task, and reporting issues that came up during the annotation sessions while at the same time providing some thoughts on important points that could be taken into account when setting up similar tasks in the future
    • …
    corecore