19,154 research outputs found
Invariant surface characteristics for 3D object recognition in range images
In recent years there has been a tremendous increase in computer vision research using range images (or depth maps) as sensor input data. The most attractive feature of range images is the explicitness of the surface information. Many industrial and navigational robotic tasks will be more easily accomplished if such explicit depth information can be efficiently obtained and interpreted. Intensity image understanding research has shown that the early processing of sensor data should be data-driven. The goal of early processing is to generate a rich description for later processing. Classical differential geometry provides a complete local description of smooth surfaces. The first and second fundamental forms of surfaces provide a set of differential-geometric shape descriptors that capture domain-independent surface information. Mean curvature and Gaussian curvature are the fundamental second-order surface characteristics that possess desirable invariance properties and represent extrinsic and intrinsic surface geometry respectively. The signs of these surface curvatures are used to classify range image regions into one of eight basic viewpoint-independent surface types. Experimental results for real and synthetic range images show the properties, usefulness, and importance of differential-geometric surface characteristics.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/26326/1/0000413.pd
On Recognizing Transparent Objects in Domestic Environments Using Fusion of Multiple Sensor Modalities
Current object recognition methods fail on object sets that include both
diffuse, reflective and transparent materials, although they are very common in
domestic scenarios. We show that a combination of cues from multiple sensor
modalities, including specular reflectance and unavailable depth information,
allows us to capture a larger subset of household objects by extending a state
of the art object recognition method. This leads to a significant increase in
robustness of recognition over a larger set of commonly used objects.Comment: 12 page
On the Design and Analysis of Multiple View Descriptors
We propose an extension of popular descriptors based on gradient orientation
histograms (HOG, computed in a single image) to multiple views. It hinges on
interpreting HOG as a conditional density in the space of sampled images, where
the effects of nuisance factors such as viewpoint and illumination are
marginalized. However, such marginalization is performed with respect to a very
coarse approximation of the underlying distribution. Our extension leverages on
the fact that multiple views of the same scene allow separating intrinsic from
nuisance variability, and thus afford better marginalization of the latter. The
result is a descriptor that has the same complexity of single-view HOG, and can
be compared in the same manner, but exploits multiple views to better trade off
insensitivity to nuisance variability with specificity to intrinsic
variability. We also introduce a novel multi-view wide-baseline matching
dataset, consisting of a mixture of real and synthetic objects with ground
truthed camera motion and dense three-dimensional geometry
Object Detection and Classification in Occupancy Grid Maps using Deep Convolutional Networks
A detailed environment perception is a crucial component of automated
vehicles. However, to deal with the amount of perceived information, we also
require segmentation strategies. Based on a grid map environment
representation, well-suited for sensor fusion, free-space estimation and
machine learning, we detect and classify objects using deep convolutional
neural networks. As input for our networks we use a multi-layer grid map
efficiently encoding 3D range sensor information. The inference output consists
of a list of rotated bounding boxes with associated semantic classes. We
conduct extensive ablation studies, highlight important design considerations
when using grid maps and evaluate our models on the KITTI Bird's Eye View
benchmark. Qualitative and quantitative benchmark results show that we achieve
robust detection and state of the art accuracy solely using top-view grid maps
from range sensor data.Comment: 6 pages, 4 tables, 4 figure
Fast Landmark Localization with 3D Component Reconstruction and CNN for Cross-Pose Recognition
Two approaches are proposed for cross-pose face recognition, one is based on
the 3D reconstruction of facial components and the other is based on the deep
Convolutional Neural Network (CNN). Unlike most 3D approaches that consider
holistic faces, the proposed approach considers 3D facial components. It
segments a 2D gallery face into components, reconstructs the 3D surface for
each component, and recognizes a probe face by component features. The
segmentation is based on the landmarks located by a hierarchical algorithm that
combines the Faster R-CNN for face detection and the Reduced Tree Structured
Model for landmark localization. The core part of the CNN-based approach is a
revised VGG network. We study the performances with different settings on the
training set, including the synthesized data from 3D reconstruction, the
real-life data from an in-the-wild database, and both types of data combined.
We investigate the performances of the network when it is employed as a
classifier or designed as a feature extractor. The two recognition approaches
and the fast landmark localization are evaluated in extensive experiments, and
compared to stateof-the-art methods to demonstrate their efficacy.Comment: 14 pages, 12 figures, 4 table
Variational Autoencoders for Deforming 3D Mesh Models
3D geometric contents are becoming increasingly popular. In this paper, we
study the problem of analyzing deforming 3D meshes using deep neural networks.
Deforming 3D meshes are flexible to represent 3D animation sequences as well as
collections of objects of the same category, allowing diverse shapes with
large-scale non-linear deformations. We propose a novel framework which we call
mesh variational autoencoders (mesh VAE), to explore the probabilistic latent
space of 3D surfaces. The framework is easy to train, and requires very few
training examples. We also propose an extended model which allows flexibly
adjusting the significance of different latent variables by altering the prior
distribution. Extensive experiments demonstrate that our general framework is
able to learn a reasonable representation for a collection of deformable
shapes, and produce competitive results for a variety of applications,
including shape generation, shape interpolation, shape space embedding and
shape exploration, outperforming state-of-the-art methods.Comment: CVPR 201
The application of range imaging for improved local feature representations
This thesis presents an investigation into the integration of information extracted from co-aligned range and intensity images to achieve pose invariant object recognition. Local feature matching is a fundamental technique in image analysis that underpins many computer vision-based applications; the approach comprises identifying a collection of interest points in an image, characterising the local image region surrounding the interest point by means of a descriptor, and matching these descriptors between example images. Such local feature descriptors are formed from a measure of the local image statistics in the region surrounding the interest point. The interest point locations and the means of measuring local image statistics should be chosen such that resultant descriptor remains stable across a range of common image transformations. Recently the availability of low cost, high quality range imaging devices has motivated an interest in local feature extraction from range images. It has been widely assumed in the vision community that the range imaging domain has properties which remain quasi-invariant through a wide range of changes in illumination and pose. Accordingly, it has been suggested that local feature extraction in the range domain should allow the calculation of local feature descriptors that are potentially more robust than those calculated from the intensity imaging domain alone. However, range images represent differing characteristics from those represented within intensity images which are frequently used, independently from range images, to create robust local features. Therefore, this work attempts to establish the best means of combining information from these two imaging modalities to further increase the reliability of matching local features.
Local feature extraction comprises a series of processes applied to an image location such that a collection of repeatable descriptors can be established. By using co-aligned range and intensity images this work investigates the choice of modality and method for each step in the extraction process as an approach to optimising the resulting descriptor. Additionally, multimodal features are formed by combining information from both domains in a single stage in the extraction process. To further improve the quality of feature descriptors, a calculation of the surface normals and a use of the 3D structure from the range image are applied to correct the 3D appearance of a local sample patch, thereby increasing the similarity between observations.
The matching performance of local features is evaluated using an experimental setup comprising a turntable and stereo pair of cameras. This experimental setup is used to create a database of intensity and range images for 5 objects imaged at 72 calibrated viewpoints, creating a database of 360 object observations. The use of a calibrated turntable in combination with the 3D object surface coordiantes, supplied by the range image allow location correspondences between object observations to be established; and therefore descriptor matches to be labelled as either true positive or false positive. Applying this methodology to the formulated local features show that two approaches demonstrate state-of-the-art performance, with a ~40% increase in area under ROC curve at a False Positive Rate of 10% when compared with standard SIFT. These approaches are range affine corrected intensity SIFT and element corrected surface gradients SIFT.
Furthermore,this work uses the 3D structure encoded in the range image to organise collections of interest points from a series of observations into a collection of canonical views in a new model local feature. The canonical views for a interest point are stored in a view compartmentalised structure which allows the appearance of a local interest point to be characterised across the view sphere. Each canonical view is assigned a confidence measure based on the 3D pose of the interest point at observation, this confidence measure is then used to match similar canonical views of model and query interest points thereby achieving a pose invariant interest point description. This approach does not produce a statistically significant performance increase. However, does contribute a validated methodology for combining multiple descriptors with differing confidence weightings into a single keypoint
- …