9,630 research outputs found
Manifold Parzen Windows
The similarity between objects is a fundamental element of many learning algorithms. Most non-parametric methods take this similarity to be fixed, but much recent work has shown the advantages of learning it, in particular to exploit the local invariances in the data or to capture the possibly non-linear manifold on which most of the data lies. We propose a new non-parametric kernel density estimation method which captures the local structure of an underlying manifold through the leading eigenvectors of regularized local covariance matrices. Experiments in density estimation show significant improvements with respect to Parzen density estimators. The density estimators can also be used within Bayes classifiers, yielding classification rates similar to SVMs and much superior to the Parzen classifier. La similarité entre objets est un élément fondamental de plusieurs algorithmes d'apprentissage. La plupart des méthodes non paramétriques supposent cette similarité constante, mais des travaux récents ont montré les avantages de les apprendre, en particulier pour exploiter les invariances locales dans les données ou pour capturer la variété possiblement non linéaire sur laquelle reposent la plupart des données. Nous proposons une nouvelle méthode d'estimation de densité à noyau non paramétrique qui capture la structure locale d'une variété sous-jacente en utilisant les vecteurs propres principaux de matrices de covariance locales régularisées. Les expériences d'estimation de densité montrent une amélioration significative sur les estimateurs de densité de Parzen. Les estimateurs de densité peuvent aussi être utilisés à l'intérieur de classificateurs de Bayes, menant à des taux de classification similaires à ceux des SVMs, et très supérieurs au classificateur de Parzen.density estimation, non-parametric models, manifold models, probabilistic classifiers, estimation de densité, modèles non paramétriques, modèles de variétés, classification probabiliste
Estimator selection: a new method with applications to kernel density estimation
Estimator selection has become a crucial issue in non parametric estimation.
Two widely used methods are penalized empirical risk minimization (such as
penalized log-likelihood estimation) or pairwise comparison (such as Lepski's
method). Our aim in this paper is twofold. First we explain some general ideas
about the calibration issue of estimator selection methods. We review some
known results, putting the emphasis on the concept of minimal penalty which is
helpful to design data-driven selection criteria. Secondly we present a new
method for bandwidth selection within the framework of kernel density density
estimation which is in some sense intermediate between these two main methods
mentioned above. We provide some theoretical results which lead to some fully
data-driven selection strategy
Beyond KernelBoost
In this Technical Report we propose a set of improvements with respect to the
KernelBoost classifier presented in [Becker et al., MICCAI 2013]. We start with
a scheme inspired by Auto-Context, but that is suitable in situations where the
lack of large training sets poses a potential problem of overfitting. The aim
is to capture the interactions between neighboring image pixels to better
regularize the boundaries of segmented regions. As in Auto-Context [Tu et al.,
PAMI 2009] the segmentation process is iterative and, at each iteration, the
segmentation results for the previous iterations are taken into account in
conjunction with the image itself. However, unlike in [Tu et al., PAMI 2009],
we organize our recursion so that the classifiers can progressively focus on
difficult-to-classify locations. This lets us exploit the power of the
decision-tree paradigm while avoiding over-fitting. In the context of this
architecture, KernelBoost represents a powerful building block due to its
ability to learn on the score maps coming from previous iterations. We first
introduce two important mechanisms to empower the KernelBoost classifier,
namely pooling and the clustering of positive samples based on the appearance
of the corresponding ground-truth. These operations significantly contribute to
increase the effectiveness of the system on biomedical images, where texture
plays a major role in the recognition of the different image components. We
then present some other techniques that can be easily integrated in the
KernelBoost framework to further improve the accuracy of the final
segmentation. We show extensive results on different medical image datasets,
including some multi-label tasks, on which our method is shown to outperform
state-of-the-art approaches. The resulting segmentations display high accuracy,
neat contours, and reduced noise
On Rendering Synthetic Images for Training an Object Detector
We propose a novel approach to synthesizing images that are effective for
training object detectors. Starting from a small set of real images, our
algorithm estimates the rendering parameters required to synthesize similar
images given a coarse 3D model of the target object. These parameters can then
be reused to generate an unlimited number of training images of the object of
interest in arbitrary 3D poses, which can then be used to increase
classification performances.
A key insight of our approach is that the synthetically generated images
should be similar to real images, not in terms of image quality, but rather in
terms of features used during the detector training. We show in the context of
drone, plane, and car detection that using such synthetically generated images
yields significantly better performances than simply perturbing real images or
even synthesizing images in such way that they look very realistic, as is often
done when only limited amounts of training data are available
Representation Learning: A Review and New Perspectives
The success of machine learning algorithms generally depends on data
representation, and we hypothesize that this is because different
representations can entangle and hide more or less the different explanatory
factors of variation behind the data. Although specific domain knowledge can be
used to help design representations, learning with generic priors can also be
used, and the quest for AI is motivating the design of more powerful
representation-learning algorithms implementing such priors. This paper reviews
recent work in the area of unsupervised feature learning and deep learning,
covering advances in probabilistic models, auto-encoders, manifold learning,
and deep networks. This motivates longer-term unanswered questions about the
appropriate objectives for learning good representations, for computing
representations (i.e., inference), and the geometrical connections between
representation learning, density estimation and manifold learning
High-dimensional sequence transduction
We investigate the problem of transforming an input sequence into a
high-dimensional output sequence in order to transcribe polyphonic audio music
into symbolic notation. We introduce a probabilistic model based on a recurrent
neural network that is able to learn realistic output distributions given the
input and we devise an efficient algorithm to search for the global mode of
that distribution. The resulting method produces musically plausible
transcriptions even under high levels of noise and drastically outperforms
previous state-of-the-art approaches on five datasets of synthesized sounds and
real recordings, approximately halving the test error rate
- …