42,955 research outputs found
Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling
We study the problem of 3D object generation. We propose a novel framework, namely 3D Generative Adversarial Network (3D-GAN), which generates 3D objects from a probabilistic space by leveraging recent advances in volumetric convo-lutional networks and generative adversarial nets. The benefits of our model are three-fold: first, the use of an adversarial criterion, instead of traditional heuristic criteria, enables the generator to capture object structure implicitly and to synthesize high-quality 3D objects; second, the generator establishes a mapping from a low-dimensional probabilistic space to the space of 3D objects, so that we can sample objects without a reference image or CAD models, and explore the 3D object manifold; third, the adversarial discriminator provides a powerful 3D shape descriptor which, learned without supervision, has wide applications in 3D object recognition. Experiments demonstrate that our method generates high-quality 3D objects, and our unsupervisedly learned features achieve impressive performance on 3D object recognition, comparable with those of supervised learning methods
Active object recognition for 2D and 3D applications
Includes bibliographical referencesActive object recognition provides a mechanism for selecting informative viewpoints to complete recognition tasks as quickly and accurately as possible. One can manipulate the position of the camera or the object of interest to obtain more useful information. This approach can improve the computational efficiency of the recognition task by only processing viewpoints selected based on the amount of relevant information they contain. Active object recognition methods are based around how to select the next best viewpoint and the integration of the extracted information. Most active recognition methods do not use local interest points which have been shown to work well in other recognition tasks and are tested on images containing a single object with no occlusions or clutter. In this thesis we investigate using local interest points (SIFT) in probabilistic and non-probabilistic settings for active single and multiple object and viewpoint/pose recognition. Test images used contain objects that are occluded and occur in significant clutter. Visually similar objects are also included in our dataset. Initially we introduce a non-probabilistic 3D active object recognition system which consists of a mechanism for selecting the next best viewpoint and an integration strategy to provide feedback to the system. A novel approach to weighting the uniqueness of features extracted is presented, using a vocabulary tree data structure. This process is then used to determine the next best viewpoint by selecting the one with the highest number of unique features. A Bayesian framework uses the modified statistics from the vocabulary structure to update the system's confidence in the identity of the object. New test images are only captured when the belief hypothesis is below a predefined threshold. This vocabulary tree method is tested against randomly selecting the next viewpoint and a state-of-the-art active object recognition method by Kootstra et al.. Our approach outperforms both methods by correctly recognizing more objects with less computational expense. This vocabulary tree method is extended for use in a probabilistic setting to improve the object recognition accuracy. We introduce Bayesian approaches for object recognition and object and pose recognition. Three likelihood models are introduced which incorporate various parameters and levels of complexity. The occlusion model, which includes geometric information and variables that cater for the background distribution and occlusion, correctly recognizes all objects on our challenging database. This probabilistic approach is further extended for recognizing multiple objects and poses in a test images. We show through experiments that this model can recognize multiple objects which occur in close proximity to distractor objects. Our viewpoint selection strategy is also extended to the multiple object application and performs well when compared to randomly selecting the next viewpoint, the activation model and mutual information. We also study the impact of using active vision for shape recognition. Fourier descriptors are used as input to our shape recognition system with mutual information as the active vision component. We build multinomial and Gaussian distributions using this information, which correctly recognizes a sequence of objects. We demonstrate the effectiveness of active vision in object recognition systems. We show that even in different recognition applications using different low level inputs, incorporating active vision improves the overall accuracy and decreases the computational expense of object recognition systems
Active Classification: Theory and Application to Underwater Inspection
We discuss the problem in which an autonomous vehicle must classify an object
based on multiple views. We focus on the active classification setting, where
the vehicle controls which views to select to best perform the classification.
The problem is formulated as an extension to Bayesian active learning, and we
show connections to recent theoretical guarantees in this area. We formally
analyze the benefit of acting adaptively as new information becomes available.
The analysis leads to a probabilistic algorithm for determining the best views
to observe based on information theoretic costs. We validate our approach in
two ways, both related to underwater inspection: 3D polyhedra recognition in
synthetic depth maps and ship hull inspection with imaging sonar. These tasks
encompass both the planning and recognition aspects of the active
classification problem. The results demonstrate that actively planning for
informative views can reduce the number of necessary views by up to 80% when
compared to passive methods.Comment: 16 page
Local wavelet features for statistical object classification and localisation
This article presents a system for texture-based
probabilistic classification and localisation of 3D objects in 2D digital images and discusses selected applications. The objects are described by local feature vectors computed using the wavelet transform. In the training phase, object features are statistically modelled as normal density functions. In the recognition phase, a maximisation algorithm compares the learned density functions
with the feature vectors extracted from a real scene and yields the classes and poses of objects found in it. Experiments carried out on a real dataset of over 40000 images demonstrate the robustness of the system in terms of classification and localisation accuracy. Finally, two important application scenarios are discussed, namely classification of museum artefacts and classification of
metallography images
Mobile Robot Object Recognition through the Synergy of Probabilistic Graphical Models and Semantic Knowledge
J.R. Ruiz-Sarmiento and C. Galindo and J. Gonzalez-Jimenez, Mobile Robot Object Recognition through the Synergy of Probabilistic Graphical Models and Semantic Knowledge, in European Conf. of Artificial Intelligence, CogRob workshop, 2014.Mobile robots intended to perform high-level tasks have to recognize objects in their workspace. In order to increase the success of the recognition process, recent works have studied the use of contextual information. Probabilistic Graphical Models (PGMs) and Semantic Knowledge (SK) are two well-known approaches for dealing with contextual information, although they exhibit some drawbacks: the PGMs complexity exponentially increases with the number of objects in the scene, while SK are unable to handle uncertainty. In this work we combine both approaches to address the object recognition problem. We propose the exploitation of SK to reduce the complexity of the probabilistic inference, while we rely on PGMs to enhance SK with a mechanism to manage uncertainty. The suitability of our method is validated through a set of experiments, in which a mobile robot endowed with a Kinect-like sensor captured 3D data from 25 real environments, achieving a promising result of ~94% of success.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. This work has been funded by the Spanish grant program FPU-MICINN 2010 and the Spanish project "TAROTH: New developments toward a robot at home"
Deep Directional Statistics: Pose Estimation with Uncertainty Quantification
Modern deep learning systems successfully solve many perception tasks such as
object pose estimation when the input image is of high quality. However, in
challenging imaging conditions such as on low-resolution images or when the
image is corrupted by imaging artifacts, current systems degrade considerably
in accuracy. While a loss in performance is unavoidable, we would like our
models to quantify their uncertainty in order to achieve robustness against
images of varying quality. Probabilistic deep learning models combine the
expressive power of deep learning with uncertainty quantification. In this
paper, we propose a novel probabilistic deep learning model for the task of
angular regression. Our model uses von Mises distributions to predict a
distribution over object pose angle. Whereas a single von Mises distribution is
making strong assumptions about the shape of the distribution, we extend the
basic model to predict a mixture of von Mises distributions. We show how to
learn a mixture model using a finite and infinite number of mixture components.
Our model allows for likelihood-based training and efficient inference at test
time. We demonstrate on a number of challenging pose estimation datasets that
our model produces calibrated probability predictions and competitive or
superior point estimates compared to the current state-of-the-art
3D-Aware Visual Question Answering about Parts, Poses and Occlusions
Despite rapid progress in Visual question answering (VQA), existing datasets
and models mainly focus on testing reasoning in 2D. However, it is important
that VQA models also understand the 3D structure of visual scenes, for example
to support tasks like navigation or manipulation. This includes an
understanding of the 3D object pose, their parts and occlusions. In this work,
we introduce the task of 3D-aware VQA, which focuses on challenging questions
that require a compositional reasoning over the 3D structure of visual scenes.
We address 3D-aware VQA from both the dataset and the model perspective. First,
we introduce Super-CLEVR-3D, a compositional reasoning dataset that contains
questions about object parts, their 3D poses, and occlusions. Second, we
propose PO3D-VQA, a 3D-aware VQA model that marries two powerful ideas:
probabilistic neural symbolic program execution for reasoning and deep neural
networks with 3D generative representations of objects for robust visual
recognition. Our experimental results show our model PO3D-VQA outperforms
existing methods significantly, but we still observe a significant performance
gap compared to 2D VQA benchmarks, indicating that 3D-aware VQA remains an
important open research area.Comment: Accepted by NeurIPS202
Learning Analysis-by-Synthesis for 6D Pose Estimation in RGB-D Images
Analysis-by-synthesis has been a successful approach for many tasks in
computer vision, such as 6D pose estimation of an object in an RGB-D image
which is the topic of this work. The idea is to compare the observation with
the output of a forward process, such as a rendered image of the object of
interest in a particular pose. Due to occlusion or complicated sensor noise, it
can be difficult to perform this comparison in a meaningful way. We propose an
approach that "learns to compare", while taking these difficulties into
account. This is done by describing the posterior density of a particular
object pose with a convolutional neural network (CNN) that compares an observed
and rendered image. The network is trained with the maximum likelihood
paradigm. We observe empirically that the CNN does not specialize to the
geometry or appearance of specific objects, and it can be used with objects of
vastly different shapes and appearances, and in different backgrounds. Compared
to state-of-the-art, we demonstrate a significant improvement on two different
datasets which include a total of eleven objects, cluttered background, and
heavy occlusion.Comment: 16 pages, 8 figure
- …