19,154 research outputs found

    Invariant surface characteristics for 3D object recognition in range images

    Full text link
    In recent years there has been a tremendous increase in computer vision research using range images (or depth maps) as sensor input data. The most attractive feature of range images is the explicitness of the surface information. Many industrial and navigational robotic tasks will be more easily accomplished if such explicit depth information can be efficiently obtained and interpreted. Intensity image understanding research has shown that the early processing of sensor data should be data-driven. The goal of early processing is to generate a rich description for later processing. Classical differential geometry provides a complete local description of smooth surfaces. The first and second fundamental forms of surfaces provide a set of differential-geometric shape descriptors that capture domain-independent surface information. Mean curvature and Gaussian curvature are the fundamental second-order surface characteristics that possess desirable invariance properties and represent extrinsic and intrinsic surface geometry respectively. The signs of these surface curvatures are used to classify range image regions into one of eight basic viewpoint-independent surface types. Experimental results for real and synthetic range images show the properties, usefulness, and importance of differential-geometric surface characteristics.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/26326/1/0000413.pd

    On Recognizing Transparent Objects in Domestic Environments Using Fusion of Multiple Sensor Modalities

    Full text link
    Current object recognition methods fail on object sets that include both diffuse, reflective and transparent materials, although they are very common in domestic scenarios. We show that a combination of cues from multiple sensor modalities, including specular reflectance and unavailable depth information, allows us to capture a larger subset of household objects by extending a state of the art object recognition method. This leads to a significant increase in robustness of recognition over a larger set of commonly used objects.Comment: 12 page

    On the Design and Analysis of Multiple View Descriptors

    Full text link
    We propose an extension of popular descriptors based on gradient orientation histograms (HOG, computed in a single image) to multiple views. It hinges on interpreting HOG as a conditional density in the space of sampled images, where the effects of nuisance factors such as viewpoint and illumination are marginalized. However, such marginalization is performed with respect to a very coarse approximation of the underlying distribution. Our extension leverages on the fact that multiple views of the same scene allow separating intrinsic from nuisance variability, and thus afford better marginalization of the latter. The result is a descriptor that has the same complexity of single-view HOG, and can be compared in the same manner, but exploits multiple views to better trade off insensitivity to nuisance variability with specificity to intrinsic variability. We also introduce a novel multi-view wide-baseline matching dataset, consisting of a mixture of real and synthetic objects with ground truthed camera motion and dense three-dimensional geometry

    Object Detection and Classification in Occupancy Grid Maps using Deep Convolutional Networks

    Full text link
    A detailed environment perception is a crucial component of automated vehicles. However, to deal with the amount of perceived information, we also require segmentation strategies. Based on a grid map environment representation, well-suited for sensor fusion, free-space estimation and machine learning, we detect and classify objects using deep convolutional neural networks. As input for our networks we use a multi-layer grid map efficiently encoding 3D range sensor information. The inference output consists of a list of rotated bounding boxes with associated semantic classes. We conduct extensive ablation studies, highlight important design considerations when using grid maps and evaluate our models on the KITTI Bird's Eye View benchmark. Qualitative and quantitative benchmark results show that we achieve robust detection and state of the art accuracy solely using top-view grid maps from range sensor data.Comment: 6 pages, 4 tables, 4 figure

    Fast Landmark Localization with 3D Component Reconstruction and CNN for Cross-Pose Recognition

    Full text link
    Two approaches are proposed for cross-pose face recognition, one is based on the 3D reconstruction of facial components and the other is based on the deep Convolutional Neural Network (CNN). Unlike most 3D approaches that consider holistic faces, the proposed approach considers 3D facial components. It segments a 2D gallery face into components, reconstructs the 3D surface for each component, and recognizes a probe face by component features. The segmentation is based on the landmarks located by a hierarchical algorithm that combines the Faster R-CNN for face detection and the Reduced Tree Structured Model for landmark localization. The core part of the CNN-based approach is a revised VGG network. We study the performances with different settings on the training set, including the synthesized data from 3D reconstruction, the real-life data from an in-the-wild database, and both types of data combined. We investigate the performances of the network when it is employed as a classifier or designed as a feature extractor. The two recognition approaches and the fast landmark localization are evaluated in extensive experiments, and compared to stateof-the-art methods to demonstrate their efficacy.Comment: 14 pages, 12 figures, 4 table

    Variational Autoencoders for Deforming 3D Mesh Models

    Full text link
    3D geometric contents are becoming increasingly popular. In this paper, we study the problem of analyzing deforming 3D meshes using deep neural networks. Deforming 3D meshes are flexible to represent 3D animation sequences as well as collections of objects of the same category, allowing diverse shapes with large-scale non-linear deformations. We propose a novel framework which we call mesh variational autoencoders (mesh VAE), to explore the probabilistic latent space of 3D surfaces. The framework is easy to train, and requires very few training examples. We also propose an extended model which allows flexibly adjusting the significance of different latent variables by altering the prior distribution. Extensive experiments demonstrate that our general framework is able to learn a reasonable representation for a collection of deformable shapes, and produce competitive results for a variety of applications, including shape generation, shape interpolation, shape space embedding and shape exploration, outperforming state-of-the-art methods.Comment: CVPR 201

    The application of range imaging for improved local feature representations

    Get PDF
    This thesis presents an investigation into the integration of information extracted from co-aligned range and intensity images to achieve pose invariant object recognition. Local feature matching is a fundamental technique in image analysis that underpins many computer vision-based applications; the approach comprises identifying a collection of interest points in an image, characterising the local image region surrounding the interest point by means of a descriptor, and matching these descriptors between example images. Such local feature descriptors are formed from a measure of the local image statistics in the region surrounding the interest point. The interest point locations and the means of measuring local image statistics should be chosen such that resultant descriptor remains stable across a range of common image transformations. Recently the availability of low cost, high quality range imaging devices has motivated an interest in local feature extraction from range images. It has been widely assumed in the vision community that the range imaging domain has properties which remain quasi-invariant through a wide range of changes in illumination and pose. Accordingly, it has been suggested that local feature extraction in the range domain should allow the calculation of local feature descriptors that are potentially more robust than those calculated from the intensity imaging domain alone. However, range images represent differing characteristics from those represented within intensity images which are frequently used, independently from range images, to create robust local features. Therefore, this work attempts to establish the best means of combining information from these two imaging modalities to further increase the reliability of matching local features. Local feature extraction comprises a series of processes applied to an image location such that a collection of repeatable descriptors can be established. By using co-aligned range and intensity images this work investigates the choice of modality and method for each step in the extraction process as an approach to optimising the resulting descriptor. Additionally, multimodal features are formed by combining information from both domains in a single stage in the extraction process. To further improve the quality of feature descriptors, a calculation of the surface normals and a use of the 3D structure from the range image are applied to correct the 3D appearance of a local sample patch, thereby increasing the similarity between observations. The matching performance of local features is evaluated using an experimental setup comprising a turntable and stereo pair of cameras. This experimental setup is used to create a database of intensity and range images for 5 objects imaged at 72 calibrated viewpoints, creating a database of 360 object observations. The use of a calibrated turntable in combination with the 3D object surface coordiantes, supplied by the range image allow location correspondences between object observations to be established; and therefore descriptor matches to be labelled as either true positive or false positive. Applying this methodology to the formulated local features show that two approaches demonstrate state-of-the-art performance, with a ~40% increase in area under ROC curve at a False Positive Rate of 10% when compared with standard SIFT. These approaches are range affine corrected intensity SIFT and element corrected surface gradients SIFT. Furthermore,this work uses the 3D structure encoded in the range image to organise collections of interest points from a series of observations into a collection of canonical views in a new model local feature. The canonical views for a interest point are stored in a view compartmentalised structure which allows the appearance of a local interest point to be characterised across the view sphere. Each canonical view is assigned a confidence measure based on the 3D pose of the interest point at observation, this confidence measure is then used to match similar canonical views of model and query interest points thereby achieving a pose invariant interest point description. This approach does not produce a statistically significant performance increase. However, does contribute a validated methodology for combining multiple descriptors with differing confidence weightings into a single keypoint
    • …
    corecore