38,717 research outputs found
Recommended from our members
Modeling uncertainties in performance of object recognition
Efficient probability modeling is indispensable for uncertainty quantification of the recognition data. If the model assumptions do not reflect the intrinsic nature of data and associated random variables, then a strong performance measure will most likely fail to come up with a correct match for recognition. In this paper we propose the probability models for two kinds of data obtained with two distinct goals of recognition: identification and discovery. We consider both frequentisi and Bayesian approaches for drawing inferences from the data
Deep Directional Statistics: Pose Estimation with Uncertainty Quantification
Modern deep learning systems successfully solve many perception tasks such as
object pose estimation when the input image is of high quality. However, in
challenging imaging conditions such as on low-resolution images or when the
image is corrupted by imaging artifacts, current systems degrade considerably
in accuracy. While a loss in performance is unavoidable, we would like our
models to quantify their uncertainty in order to achieve robustness against
images of varying quality. Probabilistic deep learning models combine the
expressive power of deep learning with uncertainty quantification. In this
paper, we propose a novel probabilistic deep learning model for the task of
angular regression. Our model uses von Mises distributions to predict a
distribution over object pose angle. Whereas a single von Mises distribution is
making strong assumptions about the shape of the distribution, we extend the
basic model to predict a mixture of von Mises distributions. We show how to
learn a mixture model using a finite and infinite number of mixture components.
Our model allows for likelihood-based training and efficient inference at test
time. We demonstrate on a number of challenging pose estimation datasets that
our model produces calibrated probability predictions and competitive or
superior point estimates compared to the current state-of-the-art
Multi-Object Classification and Unsupervised Scene Understanding Using Deep Learning Features and Latent Tree Probabilistic Models
Deep learning has shown state-of-art classification performance on datasets
such as ImageNet, which contain a single object in each image. However,
multi-object classification is far more challenging. We present a unified
framework which leverages the strengths of multiple machine learning methods,
viz deep learning, probabilistic models and kernel methods to obtain
state-of-art performance on Microsoft COCO, consisting of non-iconic images. We
incorporate contextual information in natural images through a conditional
latent tree probabilistic model (CLTM), where the object co-occurrences are
conditioned on the extracted fc7 features from pre-trained Imagenet CNN as
input. We learn the CLTM tree structure using conditional pairwise
probabilities for object co-occurrences, estimated through kernel methods, and
we learn its node and edge potentials by training a new 3-layer neural network,
which takes fc7 features as input. Object classification is carried out via
inference on the learnt conditional tree model, and we obtain significant gain
in precision-recall and F-measures on MS-COCO, especially for difficult object
categories. Moreover, the latent variables in the CLTM capture scene
information: the images with top activations for a latent node have common
themes such as being a grasslands or a food scene, and on on. In addition, we
show that a simple k-means clustering of the inferred latent nodes alone
significantly improves scene classification performance on the MIT-Indoor
dataset, without the need for any retraining, and without using scene labels
during training. Thus, we present a unified framework for multi-object
classification and unsupervised scene understanding
Occlusion resistant learning of intuitive physics from videos
To reach human performance on complex tasks, a key ability for artificial
systems is to understand physical interactions between objects, and predict
future outcomes of a situation. This ability, often referred to as intuitive
physics, has recently received attention and several methods were proposed to
learn these physical rules from video sequences. Yet, most of these methods are
restricted to the case where no, or only limited, occlusions occur. In this
work we propose a probabilistic formulation of learning intuitive physics in 3D
scenes with significant inter-object occlusions. In our formulation, object
positions are modeled as latent variables enabling the reconstruction of the
scene. We then propose a series of approximations that make this problem
tractable. Object proposals are linked across frames using a combination of a
recurrent interaction network, modeling the physics in object space, and a
compositional renderer, modeling the way in which objects project onto pixel
space. We demonstrate significant improvements over state-of-the-art in the
intuitive physics benchmark of IntPhys. We apply our method to a second dataset
with increasing levels of occlusions, showing it realistically predicts
segmentation masks up to 30 frames in the future. Finally, we also show results
on predicting motion of objects in real videos
Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation
We introduce a new loss function for the weakly-supervised training of
semantic image segmentation models based on three guiding principles: to seed
with weak localization cues, to expand objects based on the information about
which classes can occur in an image, and to constrain the segmentations to
coincide with object boundaries. We show experimentally that training a deep
convolutional neural network using the proposed loss function leads to
substantially better segmentations than previous state-of-the-art methods on
the challenging PASCAL VOC 2012 dataset. We furthermore give insight into the
working mechanism of our method by a detailed experimental study that
illustrates how the segmentation quality is affected by each term of the
proposed loss function as well as their combinations.Comment: ECCV 201
Learning Articulated Motions From Visual Demonstration
Many functional elements of human homes and workplaces consist of rigid
components which are connected through one or more sliding or rotating
linkages. Examples include doors and drawers of cabinets and appliances;
laptops; and swivel office chairs. A robotic mobile manipulator would benefit
from the ability to acquire kinematic models of such objects from observation.
This paper describes a method by which a robot can acquire an object model by
capturing depth imagery of the object as a human moves it through its range of
motion. We envision that in future, a machine newly introduced to an
environment could be shown by its human user the articulated objects particular
to that environment, inferring from these "visual demonstrations" enough
information to actuate each object independently of the user.
Our method employs sparse (markerless) feature tracking, motion segmentation,
component pose estimation, and articulation learning; it does not require prior
object models. Using the method, a robot can observe an object being exercised,
infer a kinematic model incorporating rigid, prismatic and revolute joints,
then use the model to predict the object's motion from a novel vantage point.
We evaluate the method's performance, and compare it to that of a previously
published technique, for a variety of household objects.Comment: Published in Robotics: Science and Systems X, Berkeley, CA. ISBN:
978-0-9923747-0-
3D ShapeNets: A Deep Representation for Volumetric Shapes
3D shape is a crucial but heavily underutilized cue in today's computer
vision systems, mostly due to the lack of a good generic shape representation.
With the recent availability of inexpensive 2.5D depth sensors (e.g. Microsoft
Kinect), it is becoming increasingly important to have a powerful 3D shape
representation in the loop. Apart from category recognition, recovering full 3D
shapes from view-based 2.5D depth maps is also a critical part of visual
understanding. To this end, we propose to represent a geometric 3D shape as a
probability distribution of binary variables on a 3D voxel grid, using a
Convolutional Deep Belief Network. Our model, 3D ShapeNets, learns the
distribution of complex 3D shapes across different object categories and
arbitrary poses from raw CAD data, and discovers hierarchical compositional
part representations automatically. It naturally supports joint object
recognition and shape completion from 2.5D depth maps, and it enables active
object recognition through view planning. To train our 3D deep learning model,
we construct ModelNet -- a large-scale 3D CAD model dataset. Extensive
experiments show that our 3D deep representation enables significant
performance improvement over the-state-of-the-arts in a variety of tasks.Comment: to be appeared in CVPR 201
The structure and formation of natural categories
Categorization and concept formation are critical activities of intelligence. These processes and the conceptual structures that support them raise important issues at the interface of cognitive psychology and artificial intelligence. The work presumes that advances in these and other areas are best facilitated by research methodologies that reward interdisciplinary interaction. In particular, a computational model is described of concept formation and categorization that exploits a rational analysis of basic level effects by Gluck and Corter. Their work provides a clean prescription of human category preferences that is adapted to the task of concept learning. Also, their analysis was extended to account for typicality and fan effects, and speculate on how the concept formation strategies might be extended to other facets of intelligence, such as problem solving
- …