98 research outputs found

    Learning object categories from Google's image search

    Get PDF
    Current approaches to object category recognition require datasets of training images to be manually prepared, with varying degrees of supervision. We present an approach that can learn an object category from just its name, by utilizing the raw output of image search engines available on the Internet. We develop a new model, TSI-pLSA, which extends pLSA (as applied to visual words) to include spatial information in a translation and scale invariant manner. Our approach can handle the high intra-class variability and large proportion of unrelated images returned by search engines. We evaluate the models on standard test sets, showing performance competitive with existing methods trained on hand prepared datasets

    A Comparison of Affine Region Detectors

    Full text link

    Shape description and matching using integral invariants on eccentricity transformed images

    Get PDF
    Matching occluded and noisy shapes is a problem frequently encountered in medical image analysis and more generally in computer vision. To keep track of changes inside the breast, for example, it is important for a computer aided detection system to establish correspondences between regions of interest. Shape transformations, computed both with integral invariants (II) and with geodesic distance, yield signatures that are invariant to isometric deformations, such as bending and articulations. Integral invariants describe the boundaries of planar shapes. However, they provide no information about where a particular feature lies on the boundary with regard to the overall shape structure. Conversely, eccentricity transforms (Ecc) can match shapes by signatures of geodesic distance histograms based on information from inside the shape; but they ignore the boundary information. We describe a method that combines the boundary signature of a shape obtained from II and structural information from the Ecc to yield results that improve on them separately

    The PASCAL Visual Object Classes (VOC) Challenge

    Get PDF
    The Pascal Visual Object Classes (VOC) challenge is a benchmark in visual object category recognition and detection, providing the vision and machine learning communities with a standard dataset of images and annotation, and standard evaluation procedures. Organised annually from 2005 to present, the challenge and its associated dataset has become accepted as the benchmark for object detection. This paper describes the dataset and evaluation procedure. We review the state-of-the-art in evaluated methods for both classification and detection, analyse whether the methods are statistically different, what they are learning from the images (e.g. the object or its context), and what the methods find easy or confuse. The paper concludes with lessons learnt in the three year history of the challenge, and proposes directions for future improvement and extension. © 2009 Springer Science+Business Media, LLC

    2D Articulated Human Pose Estimation and Retrieval in (Almost) Unconstrained Still Images

    Get PDF
    Abstract We present a technique for estimating the spatial layout of humans in still images—the position of the head, torso and arms. The theme we explore is that once a person is localized using an upper body detector, the search for their body parts can be considerably simplified using weak constraints on position and appearance arising from that detection. Our approach is capable of estimating upper body pose in highly challenging uncontrolled images, without prior knowledge of background, clothing, lighting, or the location and scale of the person in the image. People are only required to be upright and seen from the front or the back (not side). We evaluate the stages of our approach experimentally using ground truth layout annotation on a variety of challenging material, such as images from the PASCAL VOC 2008 challenge and video frames from TV shows and feature films. We also propose and evaluate techniques for searching a video dataset for people in a specific pose. To this end, we develop three new pose descriptors and compare their clas

    Smooth loss functions for deep top-k classification

    No full text
    The top-kk error is a common measure of performance in machine learning and computer vision. In practice, top-kk classification is typically performed with deep neural networks trained with the cross-entropy loss. Theoretical results indeed suggest that cross-entropy is an optimal learning objective for such a task in the limit of infinite data. In the context of limited and noisy data however, the use of a loss function that is specifically designed for top-kk classification can bring significant improvements. Our empirical evidence suggests that the loss function must be smooth and have non-sparse gradients in order to work well with deep neural networks. Consequently, we introduce a family of smoothed loss functions that are suited to top-kk optimization via deep learning. The widely used cross-entropy is a special case of our family. Evaluating our smooth loss functions is computationally challenging: a na{\"i}ve algorithm would require O((nk))\mathcal{O}(\binom{n}{k}) operations, where nn is the number of classes. Thanks to a connection to polynomial algebra and a divide-and-conquer approach, we provide an algorithm with a time complexity of O(kn)\mathcal{O}(k n). Furthermore, we present a novel approximation to obtain fast and stable algorithms on GPUs with single floating point precision. We compare the performance of the cross-entropy loss and our margin-based losses in various regimes of noise and data size, for the predominant use case of k=5k=5. Our investigation reveals that our loss is more robust to noise and overfitting than cross-entropy

    Relaxed softmax: efficient confidence auto-calibration for safe pedestrian detection

    No full text
    As machine learning moves from the lab into the real world, reliability is often of paramount importance. The clearest example are safety-critical applications such as pedestrian detection in autonomous driving. Since algorithms can never be expected to be perfect in all cases, managing reliability becomes crucial. To this end, in this paper we investigate the problem of learning in an end-to-end manner object detectors that are accurate while providing an unbiased estimate of the reliablity of their own predictions. We do so by proposing a modification of the standard softmax layer where a probabilistic confidence score is explicitly pre-multiplied into the incoming activations to modulate confidence. We adopt a rigorous assessment protocol based on reliability diagrams to evaluate the quality of the resulting calibration and show excellent results in pedestrian detection on two challenging public benchmarks

    Action recognition from weak alignment of body parts

    No full text
    We propose a method for human action recognition from still images that uses the silhouette and the upper body as a proxy for the pose of the person, and also to guide alignment between samples for the purpose of computing registered feature descriptors. Our contributions include an efficient algorithm, formulated as an energy minimization, for using the silhouette to align body parts between imaged human samples. The descriptors computed over the aligned body parts are incorporated, via a multiple kernel framework, together with other standard features (such as a deformable part model (DPM) and dense SIFT), to learn a classifier for each action class. Experiments on the challenging PASCAL VOC 2012 dataset shows that our method exceeds the state-of-the-art performance on the majority of action classes

    Trusting SVM for Piecewise Linear CNNs

    No full text
    We present a novel layerwise optimization algorithm for the learning objective of Piecewise-Linear Convolutional Neural Networks (PL-CNNs), a large class of convolutional neural networks. Specifically, PL-CNNs employ piecewise linear non-linearities such as the commonly used ReLU and max-pool, and an SVM classifier as the final layer. The key observation of our approach is that the problem corresponding to the parameter estimation of a layer can be formulated as a difference-of-convex (DC) program, which happens to be a latent structured SVM. We optimize the DC program using the concave-convex procedure, which requires us to iteratively solve a structured SVM problem. This allows to design an optimization algorithm with an optimal learning rate that does not require any tuning. Using the MNIST, CIFAR and ImageNet data sets, we show that our approach always improves over the state of the art variants of backpropagation and scales to large data and large network settings
    corecore