9,565 research outputs found
Weakly-Supervised Temporal Localization via Occurrence Count Learning
We propose a novel model for temporal detection and localization which allows
the training of deep neural networks using only counts of event occurrences as
training labels. This powerful weakly-supervised framework alleviates the
burden of the imprecise and time-consuming process of annotating event
locations in temporal data. Unlike existing methods, in which localization is
explicitly achieved by design, our model learns localization implicitly as a
byproduct of learning to count instances. This unique feature is a direct
consequence of the model's theoretical properties. We validate the
effectiveness of our approach in a number of experiments (drum hit and piano
onset detection in audio, digit detection in images) and demonstrate
performance comparable to that of fully-supervised state-of-the-art methods,
despite much weaker training requirements.Comment: Accepted at ICML 201
Methodological considerations concerning manual annotation of musical audio in function of algorithm development
In research on musical audio-mining, annotated music databases are needed which allow the development of computational tools that extract from the musical audiostream the kind of high-level content that users can deal with in Music Information Retrieval (MIR) contexts. The notion of musical content, and therefore the notion of annotation, is ill-defined, however, both in the syntactic and semantic sense. As a consequence, annotation has been approached from a variety of perspectives (but mainly linguistic-symbolic oriented), and a general methodology is lacking. This paper is a step towards the definition of a general framework for manual annotation of musical audio in function of a computational approach to musical audio-mining that is based on algorithms that learn from annotated data. 1
Vision-based Detection of Acoustic Timed Events: a Case Study on Clarinet Note Onsets
Acoustic events often have a visual counterpart. Knowledge of visual
information can aid the understanding of complex auditory scenes, even when
only a stereo mixdown is available in the audio domain, \eg identifying which
musicians are playing in large musical ensembles. In this paper, we consider a
vision-based approach to note onset detection. As a case study we focus on
challenging, real-world clarinetist videos and carry out preliminary
experiments on a 3D convolutional neural network based on multiple streams and
purposely avoiding temporal pooling. We release an audiovisual dataset with 4.5
hours of clarinetist videos together with cleaned annotations which include
about 36,000 onsets and the coordinates for a number of salient points and
regions of interest. By performing several training trials on our dataset, we
learned that the problem is challenging. We found that the CNN model is highly
sensitive to the optimization algorithm and hyper-parameters, and that treating
the problem as binary classification may prevent the joint optimization of
precision and recall. To encourage further research, we publicly share our
dataset, annotations and all models and detail which issues we came across
during our preliminary experiments.Comment: Proceedings of the First International Conference on Deep Learning
and Music, Anchorage, US, May, 2017 (arXiv:1706.08675v1 [cs.NE]
Collecting ground truth annotations for drum detection in polyphonic music
In order to train and test algorithms that can automatically detect drum events in polyphonic music, ground truth data is needed. This paper describes a setup used for gathering manual annotations for 49 real-world music fragments containing different drum event types. Apart from the drum events, the beat was also annotated. The annotators were experienced drummers or percussionists. This paper is primarily aimed towards other drum detection researchers, but might also be of interest to others dealing with automatic music analysis, manual annotation and data gathering. Its purpose is threefold: providing annotation data for algorithm training and evaluation, describing a practical way of setting up a drum annotation task, and reporting issues that came up during the annotation sessions while at the same time providing some thoughts on important points that could be taken into account when setting up similar tasks in the future
- …