39,570 research outputs found
Object Class Recognition Using Discriminative Local Features
In this paper, we introduce a scale-invariant feature selection method that learns to recognize and detect object classes from images of natural scenes. The first step of our method consists of clustering local scale-invariant descriptors to characterize object class appearance. Next, we train on the groups, and perform feature selection to determine the most discriminative parts. We use local regions to realize robust and sparse part and texture selection invariant to changes in scale, orientation and affine deformation and, as a result, we avoid image normalization in both training and prediction phases. We train our object models without requiring image parts to be labeled or objects to be separated from the background. Moreover, our method continues to work well when images have cluttered background and occluded objects. We evaluate our method on seven recently proposed datasets, and quantitatively compare the effect of different types of local regions and feature selection criteria on object recognition. Our experiments show that local invariant descriptors are an appropriate representation for many different object classes. Our results also confirm the importance of appearance-based discriminative feature selection.Cet article présente une méthode originale de sélection de caractéristiques invariantes à l'échelle pour la reconnaissance et la détection de classes d'objets. La premiÚre étape de notre méthode consiste à regrouper en clusters des descripteurs locaux invariants en échelle afin de caractériser l'apparence d'une classe d'objets. Un classifieur est ensuite appris pour chaque primitive et les plus discriminantes sont sélectionnées. Nous utilisons des descripteurs locaux afin de réaliser une sélection robuste et éparse de parties d'objets et de textures, invariantes aux changements d'échelle, d'orientation et aux transformations affines. De ce fait, nous évitons tout type de normalisation d'images lors des phases d'apprentissage et d'évaluation. Lors de l'apprentissage, il n'est pas nécessaire d'identifier manuellement les parties d'objets, ou de séparer manuellement un objet du fond dans une image. De plus, notre méthode donne de bons résultats quand le fond est chargé et que les objets sont partiellement occultés. Nous évaluons nos méthodes sur sept bases de données récentes, et comparons quantitativement l'influence des descripteurs locaux et des méthodes de sélection de primitives. Nos expériences montrent que les descripteurs basés sur des invariants locaux permettent une représentation appropriée de classes d'objets et souligne l'importance de la sélection de caractéristiques
SIFTing the relevant from the irrelevant: Automatically detecting objects in training images
Many state-of-the-art object recognition systems rely on identifying the location of objects in images, in order to better learn its visual attributes. In this paper, we propose four simple yet powerful hybrid ROI detection methods (combining both local and global features), based on frequently occurring keypoints. We show that our methods demonstrate competitive performance in two different types of datasets, the Caltech101 dataset and the GRAZ-02 dataset, where the pairs of keypoint bounding box method achieved the best accuracies overall
A Review of Codebook Models in Patch-Based Visual Object Recognition
The codebook model-based approach, while ignoring any structural aspect in vision, nonetheless provides state-of-the-art performances on current datasets. The key role of a visual codebook is to provide a way to map the low-level features into a fixed-length vector in histogram space to which standard classifiers can be directly applied. The discriminative power of such a visual codebook determines the quality of the codebook model, whereas the size of the codebook controls the complexity of the model. Thus, the construction of a codebook is an important step which is usually done by cluster analysis. However, clustering is a process that retains regions of high density in a distribution and it follows that the resulting codebook need not have discriminant properties. This is also recognised as a computational bottleneck of such systems. In our recent work, we proposed a resource-allocating codebook, to constructing a discriminant codebook in a one-pass design procedure that slightly outperforms more traditional approaches at drastically reduced computing times. In this review we survey several approaches that have been proposed over the last decade with their use of feature detectors, descriptors, codebook construction schemes, choice of classifiers in recognising objects, and datasets that were used in evaluating the proposed methods
From 3D Point Clouds to Pose-Normalised Depth Maps
We consider the problem of generating either pairwise-aligned or pose-normalised depth maps from noisy 3D point clouds in a relatively unrestricted poses. Our system is deployed in a 3D face alignment application and consists of the following four stages: (i) data filtering, (ii) nose tip identification and sub-vertex localisation, (iii) computation of the (relative) face orientation, (iv) generation of either a pose aligned or a pose normalised depth map. We generate an implicit radial basis function (RBF) model of the facial surface and this is employed within all four stages of the process. For example, in stage (ii), construction of novel invariant features is based on sampling this RBF over a set of concentric spheres to give a spherically-sampled RBF (SSR) shape histogram. In stage (iii), a second novel descriptor, called an isoradius contour curvature signal, is defined, which allows rotational alignment to be determined using a simple process of 1D correlation. We test our system on both the University of York (UoY) 3D face dataset and the Face Recognition Grand Challenge (FRGC) 3D data. For the more challenging UoY data, our SSR descriptors significantly outperform three variants of spin images, successfully identifying nose vertices at a rate of 99.6%. Nose localisation performance on the higher quality FRGC data, which has only small pose variations, is 99.9%. Our best system successfully normalises the pose of 3D faces at rates of 99.1% (UoY data) and 99.6% (FRGC data)
Perceptually Motivated Shape Context Which Uses Shape Interiors
In this paper, we identify some of the limitations of current-day shape
matching techniques. We provide examples of how contour-based shape matching
techniques cannot provide a good match for certain visually similar shapes. To
overcome this limitation, we propose a perceptually motivated variant of the
well-known shape context descriptor. We identify that the interior properties
of the shape play an important role in object recognition and develop a
descriptor that captures these interior properties. We show that our method can
easily be augmented with any other shape matching algorithm. We also show from
our experiments that the use of our descriptor can significantly improve the
retrieval rates
Review of Person Re-identification Techniques
Person re-identification across different surveillance cameras with disjoint
fields of view has become one of the most interesting and challenging subjects
in the area of intelligent video surveillance. Although several methods have
been developed and proposed, certain limitations and unresolved issues remain.
In all of the existing re-identification approaches, feature vectors are
extracted from segmented still images or video frames. Different similarity or
dissimilarity measures have been applied to these vectors. Some methods have
used simple constant metrics, whereas others have utilised models to obtain
optimised metrics. Some have created models based on local colour or texture
information, and others have built models based on the gait of people. In
general, the main objective of all these approaches is to achieve a
higher-accuracy rate and lowercomputational costs. This study summarises
several developments in recent literature and discusses the various available
methods used in person re-identification. Specifically, their advantages and
disadvantages are mentioned and compared.Comment: Published 201
3D ShapeNets: A Deep Representation for Volumetric Shapes
3D shape is a crucial but heavily underutilized cue in today's computer
vision systems, mostly due to the lack of a good generic shape representation.
With the recent availability of inexpensive 2.5D depth sensors (e.g. Microsoft
Kinect), it is becoming increasingly important to have a powerful 3D shape
representation in the loop. Apart from category recognition, recovering full 3D
shapes from view-based 2.5D depth maps is also a critical part of visual
understanding. To this end, we propose to represent a geometric 3D shape as a
probability distribution of binary variables on a 3D voxel grid, using a
Convolutional Deep Belief Network. Our model, 3D ShapeNets, learns the
distribution of complex 3D shapes across different object categories and
arbitrary poses from raw CAD data, and discovers hierarchical compositional
part representations automatically. It naturally supports joint object
recognition and shape completion from 2.5D depth maps, and it enables active
object recognition through view planning. To train our 3D deep learning model,
we construct ModelNet -- a large-scale 3D CAD model dataset. Extensive
experiments show that our 3D deep representation enables significant
performance improvement over the-state-of-the-arts in a variety of tasks.Comment: to be appeared in CVPR 201
- âŠ