13,574 research outputs found
On Rendering Synthetic Images for Training an Object Detector
We propose a novel approach to synthesizing images that are effective for
training object detectors. Starting from a small set of real images, our
algorithm estimates the rendering parameters required to synthesize similar
images given a coarse 3D model of the target object. These parameters can then
be reused to generate an unlimited number of training images of the object of
interest in arbitrary 3D poses, which can then be used to increase
classification performances.
A key insight of our approach is that the synthetically generated images
should be similar to real images, not in terms of image quality, but rather in
terms of features used during the detector training. We show in the context of
drone, plane, and car detection that using such synthetically generated images
yields significantly better performances than simply perturbing real images or
even synthesizing images in such way that they look very realistic, as is often
done when only limited amounts of training data are available
Beyond KernelBoost
In this Technical Report we propose a set of improvements with respect to the
KernelBoost classifier presented in [Becker et al., MICCAI 2013]. We start with
a scheme inspired by Auto-Context, but that is suitable in situations where the
lack of large training sets poses a potential problem of overfitting. The aim
is to capture the interactions between neighboring image pixels to better
regularize the boundaries of segmented regions. As in Auto-Context [Tu et al.,
PAMI 2009] the segmentation process is iterative and, at each iteration, the
segmentation results for the previous iterations are taken into account in
conjunction with the image itself. However, unlike in [Tu et al., PAMI 2009],
we organize our recursion so that the classifiers can progressively focus on
difficult-to-classify locations. This lets us exploit the power of the
decision-tree paradigm while avoiding over-fitting. In the context of this
architecture, KernelBoost represents a powerful building block due to its
ability to learn on the score maps coming from previous iterations. We first
introduce two important mechanisms to empower the KernelBoost classifier,
namely pooling and the clustering of positive samples based on the appearance
of the corresponding ground-truth. These operations significantly contribute to
increase the effectiveness of the system on biomedical images, where texture
plays a major role in the recognition of the different image components. We
then present some other techniques that can be easily integrated in the
KernelBoost framework to further improve the accuracy of the final
segmentation. We show extensive results on different medical image datasets,
including some multi-label tasks, on which our method is shown to outperform
state-of-the-art approaches. The resulting segmentations display high accuracy,
neat contours, and reduced noise
Cylinder morphology of a stretched and twisted ribbon
A rich zoology of shapes emerges from a simple stretched and twisted elastic
ribbon. Despite a lot of interest, all these shape are not understood, in
particular the shape that prevails at large tension and twist and that emerges
from a transverse instability of the helicoid. Here, we propose a simple
description for this cylindrical shape. By comparing its energy to the energy
of other configurations, we are able to determine its location on the phase
diagram. The theoretical predictions are in good agreement with our
experimental results
High-dimensional sequence transduction
We investigate the problem of transforming an input sequence into a
high-dimensional output sequence in order to transcribe polyphonic audio music
into symbolic notation. We introduce a probabilistic model based on a recurrent
neural network that is able to learn realistic output distributions given the
input and we devise an efficient algorithm to search for the global mode of
that distribution. The resulting method produces musically plausible
transcriptions even under high levels of noise and drastically outperforms
previous state-of-the-art approaches on five datasets of synthesized sounds and
real recordings, approximately halving the test error rate
Numerical performance of Penalized Comparison to Overfitting for multivariate kernel density estimation
Kernel density estimation is a well known method involving a smoothing
parameter (the bandwidth) that needs to be tuned by the user. Although this
method has been widely used the bandwidth selection remains a challenging issue
in terms of balancing algorithmic performance and statistical relevance. The
purpose of this paper is to compare a recently developped bandwidth selection
method for kernel density estimation to those which are commonly used by now
(at least those which are implemented in the R-package). This new method is
called Penalized Comparison to Overfitting (PCO). It has been proposed by some
of the authors of this paper in a previous work devoted to its statistical
relevance from a purely theoretical perspective. It is compared here to other
usual bandwidth selection methods for univariate and also multivariate kernel
density estimation on the basis of intensive simulation studies. In particular,
cross-validation and plug-in criteria are numerically investigated and compared
to PCO. The take home message is that PCO can outperform the classical methods
without algorithmic additionnal cost
Recombinator Networks: Learning Coarse-to-Fine Feature Aggregation
Deep neural networks with alternating convolutional, max-pooling and
decimation layers are widely used in state of the art architectures for
computer vision. Max-pooling purposefully discards precise spatial information
in order to create features that are more robust, and typically organized as
lower resolution spatial feature maps. On some tasks, such as whole-image
classification, max-pooling derived features are well suited; however, for
tasks requiring precise localization, such as pixel level prediction and
segmentation, max-pooling destroys exactly the information required to perform
well. Precise localization may be preserved by shallow convnets without pooling
but at the expense of robustness. Can we have our max-pooled multi-layered cake
and eat it too? Several papers have proposed summation and concatenation based
methods for combining upsampled coarse, abstract features with finer features
to produce robust pixel level predictions. Here we introduce another model ---
dubbed Recombinator Networks --- where coarse features inform finer features
early in their formation such that finer features can make use of several
layers of computation in deciding how to use coarse features. The model is
trained once, end-to-end and performs better than summation-based
architectures, reducing the error from the previous state of the art on two
facial keypoint datasets, AFW and AFLW, by 30\% and beating the current
state-of-the-art on 300W without using extra data. We improve performance even
further by adding a denoising prediction model based on a novel convnet
formulation.Comment: accepted in CVPR 201
- …
