3 research outputs found
Continuous Online Learning and New Insights to Online Imitation Learning
Online learning is a powerful tool for analyzing iterative algorithms.
However, the classic adversarial setup sometimes fails to capture certain
regularity in online problems in practice. Motivated by this, we establish a
new setup, called Continuous Online Learning (COL), where the gradient of
online loss function changes continuously across rounds with respect to the
learner's decisions. We show that COL covers and more appropriately describes
many interesting applications, from general equilibrium problems (EPs) to
optimization in episodic MDPs. Using this new setup, we revisit the difficulty
of achieving sublinear dynamic regret. We prove that there is a fundamental
equivalence between achieving sublinear dynamic regret in COL and solving
certain EPs, and we present a reduction from dynamic regret to both static
regret and convergence rate of the associated EP. At the end, we specialize
these new insights into online imitation learning and show improved
understanding of its learning stability
Learning from Imperfect Demonstrations from Agents with Varying Dynamics
Imitation learning enables robots to learn from demonstrations. Previous
imitation learning algorithms usually assume access to optimal expert
demonstrations. However, in many real-world applications, this assumption is
limiting. Most collected demonstrations are not optimal or are produced by an
agent with slightly different dynamics. We therefore address the problem of
imitation learning when the demonstrations can be sub-optimal or be drawn from
agents with varying dynamics. We develop a metric composed of a feasibility
score and an optimality score to measure how useful a demonstration is for
imitation learning. The proposed score enables learning from more informative
demonstrations, and disregarding the less relevant demonstrations. Our
experiments on four environments in simulation and on a real robot show
improved learned policies with higher expected return.Comment: Accpeted by ICRA 202
Explaining Fast Improvement in Online Imitation Learning
Online imitation learning (IL) is an algorithmic framework that leverages
interactions with expert policies for efficient policy optimization. Here
policies are optimized by performing online learning on a sequence of loss
functions that encourage the learner to mimic expert actions, and if the online
learning has no regret, the agent can provably learn an expert-like policy.
Online IL has demonstrated empirical successes in many applications and
interestingly, its policy improvement speed observed in practice is usually
much faster than existing theory suggests. In this work, we provide an
explanation of this phenomenon. Let denote the policy class bias and
assume the online IL loss functions are convex, smooth, and non-negative. We
prove that, after rounds of online IL with stochastic feedback, the policy
improves in in both expectation and high
probability. In other words, we show that adopting a sufficiently expressive
policy class in online IL has two benefits: both the policy improvement speed
increases and the performance bias decreases.Comment: 22 pages, 2 figure