15 research outputs found
Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations
The increasing accuracy of automatic chord estimation systems, the
availability of vast amounts of heterogeneous reference annotations, and
insights from annotator subjectivity research make chord label personalization
increasingly important. Nevertheless, automatic chord estimation systems are
historically exclusively trained and evaluated on a single reference
annotation. We introduce a first approach to automatic chord label
personalization by modeling subjectivity through deep learning of a harmonic
interval-based chord label representation. After integrating these
representations from multiple annotators, we can accurately personalize chord
labels for individual annotators from a single model and the annotators' chord
label vocabulary. Furthermore, we show that chord personalization using
multiple reference annotations outperforms using a single reference annotation.Comment: Proceedings of the First International Conference on Deep Learning
and Music, Anchorage, US, May, 2017 (arXiv:1706.08675v1 [cs.NE]
Vision-based Detection of Acoustic Timed Events: a Case Study on Clarinet Note Onsets
Acoustic events often have a visual counterpart. Knowledge of visual
information can aid the understanding of complex auditory scenes, even when
only a stereo mixdown is available in the audio domain, \eg identifying which
musicians are playing in large musical ensembles. In this paper, we consider a
vision-based approach to note onset detection. As a case study we focus on
challenging, real-world clarinetist videos and carry out preliminary
experiments on a 3D convolutional neural network based on multiple streams and
purposely avoiding temporal pooling. We release an audiovisual dataset with 4.5
hours of clarinetist videos together with cleaned annotations which include
about 36,000 onsets and the coordinates for a number of salient points and
regions of interest. By performing several training trials on our dataset, we
learned that the problem is challenging. We found that the CNN model is highly
sensitive to the optimization algorithm and hyper-parameters, and that treating
the problem as binary classification may prevent the joint optimization of
precision and recall. To encourage further research, we publicly share our
dataset, annotations and all models and detail which issues we came across
during our preliminary experiments.Comment: Proceedings of the First International Conference on Deep Learning
and Music, Anchorage, US, May, 2017 (arXiv:1706.08675v1 [cs.NE]