2 research outputs found

    Sample Distillation for Object Detection and Image Classification

    Get PDF
    We propose a novel approach to efficiently select informative samples for large-scale learning. Instead of directly feeding a learning algorithm with a very large amount of samples, as it is usually done to reach state-of-the-art performance, we have developed a "distillation" procedure to recursively reduce the size of an initial training set using a criterion that ensures the maximization of the information content of the selected sub-set. We demonstrate the performance of this procedure for two different computer vision problems. First, we show that distillation can be used to improve the traditional bootstrapping approach to object detection. Second, we apply distillation to a classification problem with artificial distortions. We show that in both cases, using the result of a distillation process instead of a random sub-set taken uniformly in the original sample set improves performance significantly

    Object Detection with Active Sample Harvesting

    Get PDF
    The work presented in this dissertation lies in the domains of image classification, object detection, and machine learning. Whether it is training image classifiers or object detectors, the learning phase consists in finding an optimal boundary between populations of samples. In practice, all the samples are not equally important: some examples are trivially classified and do not bring much to the training, while others close to the boundary or misclassified are the ones that truly matter. Similarly, images where the samples originate from are not all rich in informative samples. However, most training procedures select samples and images uniformly or weight them equally. The common thread of this dissertation is how to efficiently find the informative samples/images for training. Although we never consider all the possible samples "in the world", our purpose is to select the samples in a smarter manner, without looking at all the available ones. The framework adopted in this work consists in organising the data (samples or images) in a tree to reflect the statistical regularities of the training samples, by putting "similar" samples in the same branch. Each leaf carries a sample and a weight related to the "importance" of the corresponding sample, and each internal node carries statistics about the weights below. The tree is used to select the next sample/image for training, by applying a sampling policy, and the "importance" weights are updated accordingly, to bias the sampling towards informative samples/images in future iterations. Our experiments show that, in the various applications, properly focusing on informative images or informative samples improves the learning phase by either reaching better performances faster or by reducing the training loss faster
    corecore