7,096 research outputs found

    Boosted Random ferns for object detection

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.In this paper we introduce the Boosted Random Ferns (BRFs) to rapidly build discriminative classifiers for learning and detecting object categories. At the core of our approach we use standard random ferns, but we introduce four main innovations that let us bring ferns from an instance to a category level, and still retain efficiency. First, we define binary features on the histogram of oriented gradients-domain (as opposed to intensity-), allowing for a better representation of intra-class variability. Second, both the positions where ferns are evaluated within the sliding window, and the location of the binary features for each fern are not chosen completely at random, but instead we use a boosting strategy to pick the most discriminative combination of them. This is further enhanced by our third contribution, that is to adapt the boosting strategy to enable sharing of binary features among different ferns, yielding high recognition rates at a low computational cost. And finally, we show that training can be performed online, for sequentially arriving images. Overall, the resulting classifier can be very efficiently trained, densely evaluated for all image locations in about 0.1 seconds, and provides detection rates similar to competing approaches that require expensive and significantly slower processing times. We demonstrate the effectiveness of our approach by thorough experimentation in publicly available datasets in which we compare against state-of-the-art, and for tasks of both 2D detection and 3D multi-view estimation.Peer ReviewedPostprint (author's final draft

    A survey of visual preprocessing and shape representation techniques

    Get PDF
    Many recent theories and methods proposed for visual preprocessing and shape representation are summarized. The survey brings together research from the fields of biology, psychology, computer science, electrical engineering, and most recently, neural networks. It was motivated by the need to preprocess images for a sparse distributed memory (SDM), but the techniques presented may also prove useful for applying other associative memories to visual pattern recognition. The material of this survey is divided into three sections: an overview of biological visual processing; methods of preprocessing (extracting parts of shape, texture, motion, and depth); and shape representation and recognition (form invariance, primitives and structural descriptions, and theories of attention)

    Research on Symbolic Inference in Computational Vision

    Get PDF
    This paper provides an overview of ongoing research in the GRASP laboratory which focuses on the general problem of symbolic inference in computational vision. In this report we describe a conceptual framework for this research, and describe our current research programs in the component areas which support this work

    Current theories on the structure of the visual system

    Get PDF

    Part Description and Segmentation Using Contour, Surface and Volumetric Primitives

    Get PDF
    The problem of part definition, description, and decomposition is central to the shape recognition systems. The Ultimate goal of segmenting range images into meaningful parts and objects has proved to be very difficult to realize, mainly due to the isolation of the segmentation problem from the issue of representation. We propose a paradigm for part description and segmentation by integration of contour, surface, and volumetric primitives. Unlike previous approaches, we have used geometric properties derived from both boundary-based (surface contours and occluding contours), and primitive-based (quadric patches and superquadric models) representations to define and recover part-whole relationships, without a priori knowledge about the objects or object domain. The object shape is described at three levels of complexity, each contributing to the overall shape. Our approach can be summarized as answering the following question : Given that we have all three different modules for extracting volume, surface and boundary properties, how should they be invoked, evaluated and integrated? Volume and boundary fitting, and surface description are performed in parallel to incorporate the best of the coarse to fine and fine to coarse segmentation strategy. The process involves feedback between the segmentor (the Control Module) and individual shape description modules. The control module evaluates the intermediate descriptions and formulates hypotheses about parts. Hypotheses are further tested by the segmentor and the descriptors. The descriptions thus obtained are independent of position, orientation, scale, domain and domain properties, and are based purely on geometric considerations. They are extremely useful for the high level domain dependent symbolic reasoning processes, which need not deal with tremendous amount of data, but only with a rich description of data in terms of primitives recovered at various levels of complexity

    The effect of transparency on recognition of overlapping objects

    Get PDF
    Are overlapping objects easier to recognize when the objects are transparent or opaque? It is important to know whether the transparency of X-ray images of luggage contributes to the difficulty in searching those images for targets. Transparency provides extra information about objects that would normally be occluded but creates potentially ambiguous depth relations at the region of overlap. Two experiments investigated the threshold durations at which adult participants could accurately name pairs of overlapping objects that were opaque or transparent. In Experiment 1, the transparent displays included monocular cues to relative depth. Recognition of the back object was possible at shorter durations for transparent displays than for opaque displays. In Experiment 2, the transparent displays had no monocular depth cues. There was no difference in the duration at which the back object was recognized across transparent and opaque displays. The results of the two experiments suggest that transparent displays, even though less familiar than opaque displays, do not make object recognition more difficult, and possibly show a benefit. These findings call into question the importance of edge junctions in object recognitio

    Recognition-by-components: A theory of human image understanding.

    Get PDF

    From 3D Point Clouds to Pose-Normalised Depth Maps

    Get PDF
    We consider the problem of generating either pairwise-aligned or pose-normalised depth maps from noisy 3D point clouds in a relatively unrestricted poses. Our system is deployed in a 3D face alignment application and consists of the following four stages: (i) data filtering, (ii) nose tip identification and sub-vertex localisation, (iii) computation of the (relative) face orientation, (iv) generation of either a pose aligned or a pose normalised depth map. We generate an implicit radial basis function (RBF) model of the facial surface and this is employed within all four stages of the process. For example, in stage (ii), construction of novel invariant features is based on sampling this RBF over a set of concentric spheres to give a spherically-sampled RBF (SSR) shape histogram. In stage (iii), a second novel descriptor, called an isoradius contour curvature signal, is defined, which allows rotational alignment to be determined using a simple process of 1D correlation. We test our system on both the University of York (UoY) 3D face dataset and the Face Recognition Grand Challenge (FRGC) 3D data. For the more challenging UoY data, our SSR descriptors significantly outperform three variants of spin images, successfully identifying nose vertices at a rate of 99.6%. Nose localisation performance on the higher quality FRGC data, which has only small pose variations, is 99.9%. Our best system successfully normalises the pose of 3D faces at rates of 99.1% (UoY data) and 99.6% (FRGC data)
    • …
    corecore