4 research outputs found
Continuous Online Learning and New Insights to Online Imitation Learning
Online learning is a powerful tool for analyzing iterative algorithms.
However, the classic adversarial setup sometimes fails to capture certain
regularity in online problems in practice. Motivated by this, we establish a
new setup, called Continuous Online Learning (COL), where the gradient of
online loss function changes continuously across rounds with respect to the
learner's decisions. We show that COL covers and more appropriately describes
many interesting applications, from general equilibrium problems (EPs) to
optimization in episodic MDPs. Using this new setup, we revisit the difficulty
of achieving sublinear dynamic regret. We prove that there is a fundamental
equivalence between achieving sublinear dynamic regret in COL and solving
certain EPs, and we present a reduction from dynamic regret to both static
regret and convergence rate of the associated EP. At the end, we specialize
these new insights into online imitation learning and show improved
understanding of its learning stability
Counterfactual Explanation and Causal Inference in Service of Robustness in Robot Control
We propose an architecture for training generative models of counterfactual
conditionals of the form, 'can we modify event A to cause B instead of C?',
motivated by applications in robot control. Using an 'adversarial training'
paradigm, an image-based deep neural network model is trained to produce small
and realistic modifications to an original image in order to cause user-defined
effects. These modifications can be used in the design process of image-based
robust control - to determine the ability of the controller to return to a
working regime by modifications in the input space, rather than by adaptation.
In contrast to conventional control design approaches, where robustness is
quantified in terms of the ability to reject noise, we explore the space of
counterfactuals that might cause a certain requirement to be violated, thus
proposing an alternative model that might be more expressive in certain
robotics applications. So, we propose the generation of counterfactuals as an
approach to explanation of black-box models and the envisioning of potential
movement paths in autonomous robotic control. Firstly, we demonstrate this
approach in a set of classification tasks, using the well known MNIST and
CelebFaces Attributes datasets. Then, addressing multi-dimensional regression,
we demonstrate our approach in a reaching task with a physical robot, and in a
navigation task with a robot in a digital twin simulation.Comment: 8 pages, 11 figures. To be published in the 10th IEEE International
Conference on Development and Learning (ICDL), Valparaiso, Chil
Dynamic Regret Convergence Analysis and an Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning
On-policy imitation learning algorithms such as DAgger evolve a robot control
policy by executing it, measuring performance (loss), obtaining corrective
feedback from a supervisor, and generating the next policy. As the loss between
iterations can vary unpredictably, a fundamental question is under what
conditions this process will eventually achieve a converged policy. If one
assumes the underlying trajectory distribution is static (stationary), it is
possible to prove convergence for DAgger. However, in more realistic models for
robotics, the underlying trajectory distribution is dynamic because it is a
function of the policy. Recent results show it is possible to prove convergence
of DAgger when a regularity condition on the rate of change of the trajectory
distributions is satisfied. In this article, we reframe this result using
dynamic regret theory from the field of online optimization and show that
dynamic regret can be applied to any on-policy algorithm to analyze its
convergence and optimality. These results inspire a new algorithm, Adaptive
On-Policy Regularization (AOR), that ensures the conditions for convergence. We
present simulation results with cart-pole balancing and locomotion benchmarks
that suggest AOR can significantly decrease dynamic regret and chattering as
the robot learns. To our knowledge, this the first application of dynamic
regret theory to imitation learning
Online Learning with Continuous Variations: Dynamic Regret and Reductions
Online learning is a powerful tool for analyzing iterative algorithms.
However, the classic adversarial setup sometimes fails to capture certain
regularity in online problems in practice. Motivated by this, we establish a
new setup, called Continuous Online Learning (COL), where the gradient of
online loss function changes continuously across rounds with respect to the
learner's decisions. We show that COL covers and more appropriately describes
many interesting applications, from general equilibrium problems (EPs) to
optimization in episodic MDPs. In particular, we show monotone EPs admits a
reduction to achieving sublinear static regret in COL. Using this new setup, we
revisit the difficulty of sublinear dynamic regret. We prove a fundamental
equivalence between achieving sublinear dynamic regret in COL and solving
certain EPs. With this insight, we offer conditions for efficient algorithms
that achieve sublinear dynamic regret, even when the losses are chosen
adaptively without any a priori variation budget. Furthermore, we show for COL
a reduction from dynamic regret to both static regret and convergence in the
associated EP, allowing us to analyze the dynamic regret of many existing
algorithms