1,144 research outputs found
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
Sequential prediction problems such as imitation learning, where future
observations depend on previous predictions (actions), violate the common
i.i.d. assumptions made in statistical learning. This leads to poor performance
in theory and often in practice. Some recent approaches provide stronger
guarantees in this setting, but remain somewhat unsatisfactory as they train
either non-stationary or stochastic policies and require a large number of
iterations. In this paper, we propose a new iterative algorithm, which trains a
stationary deterministic policy, that can be seen as a no regret algorithm in
an online learning setting. We show that any such no regret algorithm, combined
with additional reduction assumptions, must find a policy with good performance
under the distribution of observations it induces in such sequential settings.
We demonstrate that this new approach outperforms previous approaches on two
challenging imitation learning problems and a benchmark sequence labeling
problem.Comment: Appearing in the 14th International Conference on Artificial
Intelligence and Statistics (AISTATS 2011
Reinforcement and Imitation Learning via Interactive No-Regret Learning
Recent work has demonstrated that problems-- particularly imitation learning
and structured prediction-- where a learner's predictions influence the
input-distribution it is tested on can be naturally addressed by an interactive
approach and analyzed using no-regret online learning. These approaches to
imitation learning, however, neither require nor benefit from information about
the cost of actions. We extend existing results in two directions: first, we
develop an interactive imitation learning approach that leverages cost
information; second, we extend the technique to address reinforcement learning.
The results provide theoretical support to the commonly observed successes of
online approximate policy iteration. Our approach suggests a broad new family
of algorithms and provides a unifying view of existing techniques for imitation
and reinforcement learning.Comment: 14 pages. Under review for NIPS 2014 conferenc
Learning Reductions that Really Work
We provide a summary of the mathematical and computational techniques that
have enabled learning reductions to effectively address a wide class of
problems, and show that this approach to solving machine learning problems can
be broadly useful
Inspiration Learning through Preferences
Current imitation learning techniques are too restrictive because they
require the agent and expert to share the same action space. However,
oftentimes agents that act differently from the expert can solve the task just
as good. For example, a person lifting a box can be imitated by a ceiling
mounted robot or a desktop-based robotic-arm. In both cases, the end goal of
lifting the box is achieved, perhaps using different strategies. We denote this
setup as \textit{Inspiration Learning} - knowledge transfer between agents that
operate in different action spaces. Since state-action expert demonstrations
can no longer be used, Inspiration learning requires novel methods to guide the
agent towards the end goal. In this work, we rely on ideas of Preferential
based Reinforcement Learning (PbRL) to design Advantage Actor-Critic algorithms
for solving inspiration learning tasks. Unlike classic actor-critic
architectures, the critic we use consists of two parts: a) a state-value
estimation as in common actor-critic algorithms and b) a single step reward
function derived from an expert/agent classifier. We show that our method is
capable of extending the current imitation framework to new horizons. This
includes continuous-to-discrete action imitation, as well as primitive-to-macro
action imitation
Curriculum-Based Neighborhood Sampling For Sequence Prediction
The task of multi-step ahead prediction in language models is challenging
considering the discrepancy between training and testing. At test time, a
language model is required to make predictions given past predictions as input,
instead of the past targets that are provided during training. This difference,
known as exposure bias, can lead to the compounding of errors along a generated
sequence at test time.
In order to improve generalization in neural language models and address
compounding errors, we propose a curriculum learning based method that
gradually changes an initially deterministic teacher policy to a gradually more
stochastic policy, which we refer to as \textit{Nearest-Neighbor Replacement
Sampling}. A chosen input at a given timestep is replaced with a sampled
nearest neighbor of the past target with a truncated probability proportional
to the cosine similarity between the original word and its top most similar
words. This allows the teacher to explore alternatives when the teacher
provides a sub-optimal policy or when the initial policy is difficult for the
learner to model. The proposed strategy is straightforward, online and requires
little additional memory requirements. We report our main findings on two
language modelling benchmarks and find that the proposed approach performs
particularly well when used in conjunction with scheduled sampling, that too
attempts to mitigate compounding errors in language models
Learning Beam Search Policies via Imitation Learning
Beam search is widely used for approximate decoding in structured prediction
problems. Models often use a beam at test time but ignore its existence at
train time, and therefore do not explicitly learn how to use the beam. We
develop an unifying meta-algorithm for learning beam search policies using
imitation learning. In our setting, the beam is part of the model, and not just
an artifact of approximate decoding. Our meta-algorithm captures existing
learning algorithms and suggests new ones. It also lets us show novel no-regret
guarantees for learning beam search policies.Comment: Published in NIPS 201
Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction
Researchers have demonstrated state-of-the-art performance in sequential
decision making problems (e.g., robotics control, sequential prediction) with
deep neural network models. One often has access to near-optimal oracles that
achieve good performance on the task during training. We demonstrate that
AggreVaTeD --- a policy gradient extension of the Imitation Learning (IL)
approach of (Ross & Bagnell, 2014) --- can leverage such an oracle to achieve
faster and better solutions with less training data than a less-informed
Reinforcement Learning (RL) technique. Using both feedforward and recurrent
neural network predictors, we present stochastic gradient procedures on a
sequential prediction task, dependency-parsing from raw image data, as well as
on various high dimensional robotics control problems. We also provide a
comprehensive theoretical study of IL that demonstrates we can expect up to
exponentially lower sample complexity for learning with AggreVaTeD than with RL
algorithms, which backs our empirical findings. Our results and theory indicate
that the proposed approach can achieve superior performance with respect to the
oracle when the demonstrator is sub-optimal.Comment: 17 page
Output Space Search for Structured Prediction
We consider a framework for structured prediction based on search in the
space of complete structured outputs. Given a structured input, an output is
produced by running a time-bounded search procedure guided by a learned cost
function, and then returning the least cost output uncovered during the search.
This framework can be instantiated for a wide range of search spaces and search
procedures, and easily incorporates arbitrary structured-prediction loss
functions. In this paper, we make two main technical contributions. First, we
define the limited-discrepancy search space over structured outputs, which is
able to leverage powerful classification learning algorithms to improve the
search space quality. Second, we give a generic cost function learning
approach, where the key idea is to learn a cost function that attempts to mimic
the behavior of conducting searches guided by the true loss function. Our
experiments on six benchmark domains demonstrate that using our framework with
only a small amount of search is sufficient for significantly improving on
state-of-the-art structured-prediction performance.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Uncertainty-Aware Data Aggregation for Deep Imitation Learning
Estimating statistical uncertainties allows autonomous agents to communicate
their confidence during task execution and is important for applications in
safety-critical domains such as autonomous driving. In this work, we present
the uncertainty-aware imitation learning (UAIL) algorithm for improving
end-to-end control systems via data aggregation. UAIL applies Monte Carlo
Dropout to estimate uncertainty in the control output of end-to-end systems,
using states where it is uncertain to selectively acquire new training data. In
contrast to prior data aggregation algorithms that force human experts to visit
sub-optimal states at random, UAIL can anticipate its own mistakes and switch
control to the expert in order to prevent visiting a series of sub-optimal
states. Our experimental results from simulated driving tasks demonstrate that
our proposed uncertainty estimation method can be leveraged to reliably predict
infractions. Our analysis shows that UAIL outperforms existing data aggregation
algorithms on a series of benchmark tasks.Comment: Accepted to International Conference on Robotics and Automation 201
An Imitation Game for Learning Semantic Parsers from User Interaction
Despite the widely successful applications, bootstrapping and fine-tuning
semantic parsers are still a tedious process with challenges such as costly
data annotation and privacy risks. In this paper, we suggest an alternative,
human-in-the-loop methodology for learning semantic parsers directly from
users. A semantic parser should be introspective of its uncertainties and
prompt for user demonstration when uncertain. In doing so it also gets to
imitate the user behavior and continue improving itself autonomously with the
hope that eventually it may become as good as the user in interpreting their
questions. To combat the sparsity of demonstration, we propose a novel
annotation-efficient imitation learning algorithm, which iteratively collects
new datasets by mixing demonstrated states and confident predictions and
re-trains the semantic parser in a Dataset Aggregation fashion (Ross et al.,
2011). We provide a theoretical analysis of its cost bound and also empirically
demonstrate its promising performance on the text-to-SQL problem. Code will be
available at https://github.com/sunlab-osu/MISP.Comment: Accepted to EMNLP 2020. 20 pages, 6 figure
- …