11 research outputs found
Differentiable Algorithm Networks for Composable Robot Learning
This paper introduces the Differentiable Algorithm Network (DAN), a
composable architecture for robot learning systems. A DAN is composed of neural
network modules, each encoding a differentiable robot algorithm and an
associated model; and it is trained end-to-end from data. DAN combines the
strengths of model-driven modular system design and data-driven end-to-end
learning. The algorithms and models act as structural assumptions to reduce the
data requirements for learning; end-to-end learning allows the modules to adapt
to one another and compensate for imperfect models and algorithms, in order to
achieve the best overall system performance. We illustrate the DAN methodology
through a case study on a simulated robot system, which learns to navigate in
complex 3-D environments with only local visual observations and an image of a
partially correct 2-D floor map.Comment: RSS 2019 camera ready. Video is available at
https://youtu.be/4jcYlTSJF4
A Smooth Representation of Belief over SO(3) for Deep Rotation Learning with Uncertainty
Accurate rotation estimation is at the heart of robot perception tasks such
as visual odometry and object pose estimation. Deep neural networks have
provided a new way to perform these tasks, and the choice of rotation
representation is an important part of network design. In this work, we present
a novel symmetric matrix representation of the 3D rotation group, SO(3), with
two important properties that make it particularly suitable for learned models:
(1) it satisfies a smoothness property that improves convergence and
generalization when regressing large rotation targets, and (2) it encodes a
symmetric Bingham belief over the space of unit quaternions, permitting the
training of uncertainty-aware models. We empirically validate the benefits of
our formulation by training deep neural rotation regressors on two data
modalities. First, we use synthetic point-cloud data to show that our
representation leads to superior predictive accuracy over existing
representations for arbitrary rotation targets. Second, we use image data
collected onboard ground and aerial vehicles to demonstrate that our
representation is amenable to an effective out-of-distribution (OOD) rejection
technique that significantly improves the robustness of rotation estimates to
unseen environmental effects and corrupted input images, without requiring the
use of an explicit likelihood loss, stochastic sampling, or an auxiliary
classifier. This capability is key for safety-critical applications where
detecting novel inputs can prevent catastrophic failure of learned models.Comment: In Proceedings of Robotics: Science and Systems (RSS'20), Corvallis ,
Oregon, USA, Jul. 12-16, 202
Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning
In the predict-then-optimize framework, the objective is to train a
predictive model, mapping from environment features to parameters of an
optimization problem, which maximizes decision quality when the optimization is
subsequently solved. Recent work on decision-focused learning shows that
embedding the optimization problem in the training pipeline can improve
decision quality and help generalize better to unseen tasks compared to relying
on an intermediate loss function for evaluating prediction quality. We study
the predict-then-optimize framework in the context of sequential decision
problems (formulated as MDPs) that are solved via reinforcement learning. In
particular, we are given environment features and a set of trajectories from
training MDPs, which we use to train a predictive model that generalizes to
unseen test MDPs without trajectories. Two significant computational challenges
arise in applying decision-focused learning to MDPs: (i) large state and action
spaces make it infeasible for existing techniques to differentiate through MDP
problems, and (ii) the high-dimensional policy space, as parameterized by a
neural network, makes differentiating through a policy expensive. We resolve
the first challenge by sampling provably unbiased derivatives to approximate
and differentiate through optimality conditions, and the second challenge by
using a low-rank approximation to the high-dimensional sample-based
derivatives. We implement both Bellman--based and policy gradient--based
decision-focused learning on three different MDP problems with missing
parameters, and show that decision-focused learning performs better in
generalization to unseen tasks