14 research outputs found
Shared Autonomy via Hindsight Optimization
In shared autonomy, user input and robot autonomy are combined to control a
robot to achieve a goal. Often, the robot does not know a priori which goal the
user wants to achieve, and must both predict the user's intended goal, and
assist in achieving that goal. We formulate the problem of shared autonomy as a
Partially Observable Markov Decision Process with uncertainty over the user's
goal. We utilize maximum entropy inverse optimal control to estimate a
distribution over the user's goal based on the history of inputs. Ideally, the
robot assists the user by solving for an action which minimizes the expected
cost-to-go for the (unknown) goal. As solving the POMDP to select the optimal
action is intractable, we use hindsight optimization to approximate the
solution. In a user study, we compare our method to a standard
predict-then-blend approach. We find that our method enables users to
accomplish tasks more quickly while utilizing less input. However, when asked
to rate each system, users were mixed in their assessment, citing a tradeoff
between maintaining control authority and accomplishing tasks quickly
Adaptive Information Gathering via Imitation Learning
In the adaptive information gathering problem, a policy is required to select
an informative sensing location using the history of measurements acquired thus
far. While there is an extensive amount of prior work investigating effective
practical approximations using variants of Shannon's entropy, the efficacy of
such policies heavily depends on the geometric distribution of objects in the
world. On the other hand, the principled approach of employing online POMDP
solvers is rendered impractical by the need to explicitly sample online from a
posterior distribution of world maps.
We present a novel data-driven imitation learning framework to efficiently
train information gathering policies. The policy imitates a clairvoyant oracle
- an oracle that at train time has full knowledge about the world map and can
compute maximally informative sensing locations. We analyze the learnt policy
by showing that offline imitation of a clairvoyant oracle is implicitly
equivalent to online oracle execution in conjunction with posterior sampling.
This observation allows us to obtain powerful near-optimality guarantees for
information gathering problems possessing an adaptive sub-modularity property.
As demonstrated on a spectrum of 2D and 3D exploration problems, the trained
policies enjoy the best of both worlds - they adapt to different world map
distributions while being computationally inexpensive to evaluate.Comment: Robotics Science and Systems, 201