1,007 research outputs found
Invariances and Data Augmentation for Supervised Music Transcription
This paper explores a variety of models for frame-based music transcription,
with an emphasis on the methods needed to reach state-of-the-art on human
recordings. The translation-invariant network discussed in this paper, which
combines a traditional filterbank with a convolutional neural network, was the
top-performing model in the 2017 MIREX Multiple Fundamental Frequency
Estimation evaluation. This class of models shares parameters in the
log-frequency domain, which exploits the frequency invariance of music to
reduce the number of model parameters and avoid overfitting to the training
data. All models in this paper were trained with supervision by labeled data
from the MusicNet dataset, augmented by random label-preserving pitch-shift
transformations.Comment: 6 page
Graph kernels between point clouds
Point clouds are sets of points in two or three dimensions. Most kernel
methods for learning on sets of points have not yet dealt with the specific
geometrical invariances and practical constraints associated with point clouds
in computer vision and graphics. In this paper, we present extensions of graph
kernels for point clouds, which allow to use kernel methods for such ob jects
as shapes, line drawings, or any three-dimensional point clouds. In order to
design rich and numerically efficient kernels with as few free parameters as
possible, we use kernels between covariance matrices and their factorizations
on graphical models. We derive polynomial time dynamic programming recursions
and present applications to recognition of handwritten digits and Chinese
characters from few training examples
- …