95 research outputs found
Active Information Acquisition With Mobile Robots
The recent proliferation of sensors and robots has potential to transform fields as diverse as environmental monitoring, security and surveillance, localization and mapping, and structure inspection. One of the great technical challenges in these scenarios is to control the sensors and robots in order to extract accurate information about various physical phenomena autonomously. The goal of this dissertation is to provide a unified approach for active information acquisition with a team of sensing robots. We formulate a decision problem for maximizing relevant information measures, constrained by the motion capabilities and sensing modalities of the robots, and focus on the design of a scalable control strategy for the robot team.
The first part of the dissertation studies the active information acquisition problem in the special case of linear Gaussian sensing and mobility models. We show that the classical principle of separation between estimation and control holds in this case. It enables us to reduce the original stochastic optimal control problem to a deterministic version and to provide an optimal centralized solution. Unfortunately, the complexity of obtaining the optimal solution scales exponentially with the length of the planning horizon and the number of robots. We develop approximation algorithms to manage the complexity in both of these factors and provide theoretical performance guarantees. Applications in gas concentration mapping, joint localization and vehicle tracking in sensor networks, and active multi-robot localization and mapping are presented. Coupled with linearization and model predictive control, our algorithms can even generate adaptive control policies for nonlinear sensing and mobility models.
Linear Gaussian information seeking, however, cannot be applied directly in the presence of sensing nuisances such as missed detections, false alarms, and ambiguous data association or when some sensor observations are discrete (e.g., object classes, medical alarms) or, even worse, when the sensing and target models are entirely unknown. The second part of the dissertation considers these complications in the context of two applications: active localization from semantic observations (e.g, recognized objects) and radio signal source seeking. The complexity of the target inference problem forces us to resort to greedy planning of the sensor trajectories.
Non-greedy closed-loop information acquisition with general discrete models is achieved in the final part of the dissertation via dynamic programming and Monte Carlo tree search algorithms. Applications in active object recognition and pose estimation are presented. The techniques developed in this thesis offer an effective and scalable approach for controlled information acquisition with multiple sensing robots and have broad applications to environmental monitoring, search and rescue, security and surveillance, localization and mapping, precision agriculture, and structure inspection
3D ShapeNets: A Deep Representation for Volumetric Shapes
3D shape is a crucial but heavily underutilized cue in today's computer
vision systems, mostly due to the lack of a good generic shape representation.
With the recent availability of inexpensive 2.5D depth sensors (e.g. Microsoft
Kinect), it is becoming increasingly important to have a powerful 3D shape
representation in the loop. Apart from category recognition, recovering full 3D
shapes from view-based 2.5D depth maps is also a critical part of visual
understanding. To this end, we propose to represent a geometric 3D shape as a
probability distribution of binary variables on a 3D voxel grid, using a
Convolutional Deep Belief Network. Our model, 3D ShapeNets, learns the
distribution of complex 3D shapes across different object categories and
arbitrary poses from raw CAD data, and discovers hierarchical compositional
part representations automatically. It naturally supports joint object
recognition and shape completion from 2.5D depth maps, and it enables active
object recognition through view planning. To train our 3D deep learning model,
we construct ModelNet -- a large-scale 3D CAD model dataset. Extensive
experiments show that our 3D deep representation enables significant
performance improvement over the-state-of-the-arts in a variety of tasks.Comment: to be appeared in CVPR 201
Active Classification: Theory and Application to Underwater Inspection
We discuss the problem in which an autonomous vehicle must classify an object
based on multiple views. We focus on the active classification setting, where
the vehicle controls which views to select to best perform the classification.
The problem is formulated as an extension to Bayesian active learning, and we
show connections to recent theoretical guarantees in this area. We formally
analyze the benefit of acting adaptively as new information becomes available.
The analysis leads to a probabilistic algorithm for determining the best views
to observe based on information theoretic costs. We validate our approach in
two ways, both related to underwater inspection: 3D polyhedra recognition in
synthetic depth maps and ship hull inspection with imaging sonar. These tasks
encompass both the planning and recognition aspects of the active
classification problem. The results demonstrate that actively planning for
informative views can reduce the number of necessary views by up to 80% when
compared to passive methods.Comment: 16 page
Active vision for dexterous grasping of novel objects
How should a robot direct active vision so as to ensure reliable grasping? We
answer this question for the case of dexterous grasping of unfamiliar objects.
By dexterous grasping we simply mean grasping by any hand with more than two
fingers, such that the robot has some choice about where to place each finger.
Such grasps typically fail in one of two ways, either unmodeled objects in the
scene cause collisions or object reconstruction is insufficient to ensure that
the grasp points provide a stable force closure. These problems can be solved
more easily if active sensing is guided by the anticipated actions. Our
approach has three stages. First, we take a single view and generate candidate
grasps from the resulting partial object reconstruction. Second, we drive the
active vision approach to maximise surface reconstruction quality around the
planned contact points. During this phase, the anticipated grasp is continually
refined. Third, we direct gaze to improve the safety of the planned reach to
grasp trajectory. We show, on a dexterous manipulator with a camera on the
wrist, that our approach (80.4% success rate) outperforms a randomised
algorithm (64.3% success rate).Comment: IROS 2016. Supplementary video: https://youtu.be/uBSOO6tMzw
Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd
Object detection and 6D pose estimation in the crowd (scenes with multiple
object instances, severe foreground occlusions and background distractors), has
become an important problem in many rapidly evolving technological areas such
as robotics and augmented reality. Single shot-based 6D pose estimators with
manually designed features are still unable to tackle the above challenges,
motivating the research towards unsupervised feature learning and
next-best-view estimation. In this work, we present a complete framework for
both single shot-based 6D object pose estimation and next-best-view prediction
based on Hough Forests, the state of the art object pose estimator that
performs classification and regression jointly. Rather than using manually
designed features we a) propose an unsupervised feature learnt from
depth-invariant patches using a Sparse Autoencoder and b) offer an extensive
evaluation of various state of the art features. Furthermore, taking advantage
of the clustering performed in the leaf nodes of Hough Forests, we learn to
estimate the reduction of uncertainty in other views, formulating the problem
of selecting the next-best-view. To further improve pose estimation, we propose
an improved joint registration and hypotheses verification module as a final
refinement step to reject false detections. We provide two additional
challenging datasets inspired from realistic scenarios to extensively evaluate
the state of the art and our framework. One is related to domestic environments
and the other depicts a bin-picking scenario mostly found in industrial
settings. We show that our framework significantly outperforms state of the art
both on public and on our datasets.Comment: CVPR 2016 accepted paper, project page:
http://www.iis.ee.ic.ac.uk/rkouskou/6D_NBV.htm
Policy Learning with Hypothesis based Local Action Selection
For robots to be able to manipulate in unknown and unstructured environments
the robot should be capable of operating under partial observability of the
environment. Object occlusions and unmodeled environments are some of the
factors that result in partial observability. A common scenario where this is
encountered is manipulation in clutter. In the case that the robot needs to
locate an object of interest and manipulate it, it needs to perform a series of
decluttering actions to accurately detect the object of interest. To perform
such a series of actions, the robot also needs to account for the dynamics of
objects in the environment and how they react to contact. This is a non trivial
problem since one needs to reason not only about robot-object interactions but
also object-object interactions in the presence of contact. In the example
scenario of manipulation in clutter, the state vector would have to account for
the pose of the object of interest and the structure of the surrounding
environment. The process model would have to account for all the aforementioned
robot-object, object-object interactions. The complexity of the process model
grows exponentially as the number of objects in the scene increases. This is
commonly the case in unstructured environments. Hence it is not reasonable to
attempt to model all object-object and robot-object interactions explicitly.
Under this setting we propose a hypothesis based action selection algorithm
where we construct a hypothesis set of the possible poses of an object of
interest given the current evidence in the scene and select actions based on
our current set of hypothesis. This hypothesis set tends to represent the
belief about the structure of the environment and the number of poses the
object of interest can take. The agent's only stopping criterion is when the
uncertainty regarding the pose of the object is fully resolved.Comment: RLDM abstrac
- …