6,923 research outputs found
Learning Generalized Reactive Policies using Deep Neural Networks
We present a new approach to learning for planning, where knowledge acquired
while solving a given set of planning problems is used to plan faster in
related, but new problem instances. We show that a deep neural network can be
used to learn and represent a \emph{generalized reactive policy} (GRP) that
maps a problem instance and a state to an action, and that the learned GRPs
efficiently solve large classes of challenging problem instances. In contrast
to prior efforts in this direction, our approach significantly reduces the
dependence of learning on handcrafted domain knowledge or feature selection.
Instead, the GRP is trained from scratch using a set of successful execution
traces. We show that our approach can also be used to automatically learn a
heuristic function that can be used in directed search algorithms. We evaluate
our approach using an extensive suite of experiments on two challenging
planning problem domains and show that our approach facilitates learning
complex decision making policies and powerful heuristic functions with minimal
human input. Videos of our results are available at goo.gl/Hpy4e3
Episodic Learning with Control Lyapunov Functions for Uncertain Robotic Systems
Many modern nonlinear control methods aim to endow systems with guaranteed
properties, such as stability or safety, and have been successfully applied to
the domain of robotics. However, model uncertainty remains a persistent
challenge, weakening theoretical guarantees and causing implementation failures
on physical systems. This paper develops a machine learning framework centered
around Control Lyapunov Functions (CLFs) to adapt to parametric uncertainty and
unmodeled dynamics in general robotic systems. Our proposed method proceeds by
iteratively updating estimates of Lyapunov function derivatives and improving
controllers, ultimately yielding a stabilizing quadratic program model-based
controller. We validate our approach on a planar Segway simulation,
demonstrating substantial performance improvements by iteratively refining on a
base model-free controller
An Empirical Comparison on Imitation Learning and Reinforcement Learning for Paraphrase Generation
Generating paraphrases from given sentences involves decoding words step by
step from a large vocabulary. To learn a decoder, supervised learning which
maximizes the likelihood of tokens always suffers from the exposure bias.
Although both reinforcement learning (RL) and imitation learning (IL) have been
widely used to alleviate the bias, the lack of direct comparison leads to only
a partial image on their benefits. In this work, we present an empirical study
on how RL and IL can help boost the performance of generating paraphrases, with
the pointer-generator as a base model. Experiments on the benchmark datasets
show that (1) imitation learning is constantly better than reinforcement
learning; and (2) the pointer-generator models with imitation learning
outperform the state-of-the-art methods with a large margin.Comment: 9 pages, 2 figures, EMNLP201
- …