13,574 research outputs found

    On Rendering Synthetic Images for Training an Object Detector

    Get PDF
    We propose a novel approach to synthesizing images that are effective for training object detectors. Starting from a small set of real images, our algorithm estimates the rendering parameters required to synthesize similar images given a coarse 3D model of the target object. These parameters can then be reused to generate an unlimited number of training images of the object of interest in arbitrary 3D poses, which can then be used to increase classification performances. A key insight of our approach is that the synthetically generated images should be similar to real images, not in terms of image quality, but rather in terms of features used during the detector training. We show in the context of drone, plane, and car detection that using such synthetically generated images yields significantly better performances than simply perturbing real images or even synthesizing images in such way that they look very realistic, as is often done when only limited amounts of training data are available

    Beyond KernelBoost

    Get PDF
    In this Technical Report we propose a set of improvements with respect to the KernelBoost classifier presented in [Becker et al., MICCAI 2013]. We start with a scheme inspired by Auto-Context, but that is suitable in situations where the lack of large training sets poses a potential problem of overfitting. The aim is to capture the interactions between neighboring image pixels to better regularize the boundaries of segmented regions. As in Auto-Context [Tu et al., PAMI 2009] the segmentation process is iterative and, at each iteration, the segmentation results for the previous iterations are taken into account in conjunction with the image itself. However, unlike in [Tu et al., PAMI 2009], we organize our recursion so that the classifiers can progressively focus on difficult-to-classify locations. This lets us exploit the power of the decision-tree paradigm while avoiding over-fitting. In the context of this architecture, KernelBoost represents a powerful building block due to its ability to learn on the score maps coming from previous iterations. We first introduce two important mechanisms to empower the KernelBoost classifier, namely pooling and the clustering of positive samples based on the appearance of the corresponding ground-truth. These operations significantly contribute to increase the effectiveness of the system on biomedical images, where texture plays a major role in the recognition of the different image components. We then present some other techniques that can be easily integrated in the KernelBoost framework to further improve the accuracy of the final segmentation. We show extensive results on different medical image datasets, including some multi-label tasks, on which our method is shown to outperform state-of-the-art approaches. The resulting segmentations display high accuracy, neat contours, and reduced noise

    Cylinder morphology of a stretched and twisted ribbon

    Get PDF
    A rich zoology of shapes emerges from a simple stretched and twisted elastic ribbon. Despite a lot of interest, all these shape are not understood, in particular the shape that prevails at large tension and twist and that emerges from a transverse instability of the helicoid. Here, we propose a simple description for this cylindrical shape. By comparing its energy to the energy of other configurations, we are able to determine its location on the phase diagram. The theoretical predictions are in good agreement with our experimental results

    High-dimensional sequence transduction

    Full text link
    We investigate the problem of transforming an input sequence into a high-dimensional output sequence in order to transcribe polyphonic audio music into symbolic notation. We introduce a probabilistic model based on a recurrent neural network that is able to learn realistic output distributions given the input and we devise an efficient algorithm to search for the global mode of that distribution. The resulting method produces musically plausible transcriptions even under high levels of noise and drastically outperforms previous state-of-the-art approaches on five datasets of synthesized sounds and real recordings, approximately halving the test error rate

    Numerical performance of Penalized Comparison to Overfitting for multivariate kernel density estimation

    Full text link
    Kernel density estimation is a well known method involving a smoothing parameter (the bandwidth) that needs to be tuned by the user. Although this method has been widely used the bandwidth selection remains a challenging issue in terms of balancing algorithmic performance and statistical relevance. The purpose of this paper is to compare a recently developped bandwidth selection method for kernel density estimation to those which are commonly used by now (at least those which are implemented in the R-package). This new method is called Penalized Comparison to Overfitting (PCO). It has been proposed by some of the authors of this paper in a previous work devoted to its statistical relevance from a purely theoretical perspective. It is compared here to other usual bandwidth selection methods for univariate and also multivariate kernel density estimation on the basis of intensive simulation studies. In particular, cross-validation and plug-in criteria are numerically investigated and compared to PCO. The take home message is that PCO can outperform the classical methods without algorithmic additionnal cost

    Recombinator Networks: Learning Coarse-to-Fine Feature Aggregation

    Full text link
    Deep neural networks with alternating convolutional, max-pooling and decimation layers are widely used in state of the art architectures for computer vision. Max-pooling purposefully discards precise spatial information in order to create features that are more robust, and typically organized as lower resolution spatial feature maps. On some tasks, such as whole-image classification, max-pooling derived features are well suited; however, for tasks requiring precise localization, such as pixel level prediction and segmentation, max-pooling destroys exactly the information required to perform well. Precise localization may be preserved by shallow convnets without pooling but at the expense of robustness. Can we have our max-pooled multi-layered cake and eat it too? Several papers have proposed summation and concatenation based methods for combining upsampled coarse, abstract features with finer features to produce robust pixel level predictions. Here we introduce another model --- dubbed Recombinator Networks --- where coarse features inform finer features early in their formation such that finer features can make use of several layers of computation in deciding how to use coarse features. The model is trained once, end-to-end and performs better than summation-based architectures, reducing the error from the previous state of the art on two facial keypoint datasets, AFW and AFLW, by 30\% and beating the current state-of-the-art on 300W without using extra data. We improve performance even further by adding a denoising prediction model based on a novel convnet formulation.Comment: accepted in CVPR 201
    corecore