9,356 research outputs found
Hybrid Reinforcement Learning with Expert State Sequences
Existing imitation learning approaches often require that the complete
demonstration data, including sequences of actions and states, are available.
In this paper, we consider a more realistic and difficult scenario where a
reinforcement learning agent only has access to the state sequences of an
expert, while the expert actions are unobserved. We propose a novel
tensor-based model to infer the unobserved actions of the expert state
sequences. The policy of the agent is then optimized via a hybrid objective
combining reinforcement learning and imitation learning. We evaluated our
hybrid approach on an illustrative domain and Atari games. The empirical
results show that (1) the agents are able to leverage state expert sequences to
learn faster than pure reinforcement learning baselines, (2) our tensor-based
action inference model is advantageous compared to standard deep neural
networks in inferring expert actions, and (3) the hybrid policy optimization
objective is robust against noise in expert state sequences.Comment: AAAI 2019; https://github.com/XiaoxiaoGuo/tensor4r
Robust learning from observation with model misspecification
Imitation learning (IL) is a popular paradigm for training policies in
robotic systems when specifying the reward function is difficult. However,
despite the success of IL algorithms, they impose the somewhat unrealistic
requirement that the expert demonstrations must come from the same domain in
which a new imitator policy is to be learned. We consider a practical setting,
where (i) state-only expert demonstrations from the real (deployment)
environment are given to the learner, (ii) the imitation learner policy is
trained in a simulation (training) environment whose transition dynamics is
slightly different from the real environment, and (iii) the learner does not
have any access to the real environment during the training phase beyond the
batch of demonstrations given. Most of the current IL methods, such as
generative adversarial imitation learning and its state-only variants, fail to
imitate the optimal expert behavior under the above setting. By leveraging
insights from the Robust reinforcement learning (RL) literature and building on
recent adversarial imitation approaches, we propose a robust IL algorithm to
learn policies that can effectively transfer to the real environment without
fine-tuning. Furthermore, we empirically demonstrate on continuous-control
benchmarks that our method outperforms the state-of-the-art state-only IL
method in terms of the zero-shot transfer performance in the real environment
and robust performance under different testing conditions.Comment: accepted to AAMAS 2022 (camera-ready version
- …