692 research outputs found
Coherent Soft Imitation Learning
Imitation learning methods seek to learn from an expert either through
behavioral cloning (BC) of the policy or inverse reinforcement learning (IRL)
of the reward. Such methods enable agents to learn complex tasks from humans
that are difficult to capture with hand-designed reward functions. Choosing BC
or IRL for imitation depends on the quality and state-action coverage of the
demonstrations, as well as additional access to the Markov decision process.
Hybrid strategies that combine BC and IRL are not common, as initial policy
optimization against inaccurate rewards diminishes the benefit of pretraining
the policy with BC. This work derives an imitation method that captures the
strengths of both BC and IRL. In the entropy-regularized ('soft') reinforcement
learning setting, we show that the behaviour-cloned policy can be used as both
a shaped reward and a critic hypothesis space by inverting the regularized
policy update. This coherency facilities fine-tuning cloned policies using the
reward estimate and additional interactions with the environment. This approach
conveniently achieves imitation learning through initial behaviour cloning,
followed by refinement via RL with online or offline data sources. The
simplicity of the approach enables graceful scaling to high-dimensional and
vision-based tasks, with stable learning and minimal hyperparameter tuning, in
contrast to adversarial approaches.Comment: 51 pages, 47 figures. DeepMind internship repor
Sample-efficient Adversarial Imitation Learning
Imitation learning, in which learning is performed by demonstration, has been
studied and advanced for sequential decision-making tasks in which a reward
function is not predefined. However, imitation learning methods still require
numerous expert demonstration samples to successfully imitate an expert's
behavior. To improve sample efficiency, we utilize self-supervised
representation learning, which can generate vast training signals from the
given data. In this study, we propose a self-supervised representation-based
adversarial imitation learning method to learn state and action representations
that are robust to diverse distortions and temporally predictive, on non-image
control tasks. In particular, in comparison with existing self-supervised
learning methods for tabular data, we propose a different corruption method for
state and action representations that is robust to diverse distortions. We
theoretically and empirically observe that making an informative feature
manifold with less sample complexity significantly improves the performance of
imitation learning. The proposed method shows a 39% relative improvement over
existing adversarial imitation learning methods on MuJoCo in a setting limited
to 100 expert state-action pairs. Moreover, we conduct comprehensive ablations
and additional experiments using demonstrations with varying optimality to
provide insights into a range of factors.Comment: A preliminary version of this manuscript was presented at Deep RL
Workshop, NeurIPS 202
Stochastic Model Predictive Control Using Simplified Affine Disturbance Feedback for Chance-Constrained Systems
This letter covers the model predictive control of linear discrete-time systems subject to stochastic additive disturbances and chance constraints on their state and control input. We propose a simplified control parameterization under the framework of affine disturbance feedback, and we show that our method is equivalent to parameterization over the family of state feedback policies. Using our method, associated finite-horizon optimization can be computed efficiently, with a slight increase in conservativeness compared with conventional affine disturbance feedback parameterization
- …