Search CORE

69 research outputs found

Pretraining in Deep Reinforcement Learning: A Survey

Author: Li Junyou
Li Shuai
Lin Zichuan
Xie Zhihui
Ye Deheng
Publication venue
Publication date: 07/11/2022
Field of study

The past few years have seen rapid progress in combining reinforcement learning (RL) with deep learning. Various breakthroughs ranging from games to robotics have spurred the interest in designing sophisticated RL algorithms and systems. However, the prevailing workflow in RL is to learn tabula rasa, which may incur computational inefficiency. This precludes continuous deployment of RL algorithms and potentially excludes researchers without large-scale computing resources. In many other areas of machine learning, the pretraining paradigm has shown to be effective in acquiring transferable knowledge, which can be utilized for a variety of downstream tasks. Recently, we saw a surge of interest in Pretraining for Deep RL with promising results. However, much of the research has been based on different experimental settings. Due to the nature of RL, pretraining in this field is faced with unique challenges and hence requires new design principles. In this survey, we seek to systematically review existing works in pretraining for deep reinforcement learning, provide a taxonomy of these methods, discuss each sub-field, and bring attention to open problems and future directions

arXiv.org e-Print Archive

Guaranteed Discovery of Control-Endogenous Latent States with Multi-Step Inverse Models

Author: Chari Rajan
Didolkar Aniket
Efroni Yonathan
Foster Dylan
Islam Riashat
Krishnamurthy Akshay
Lamb Alex
Langford John
Misra Dipendra
Molu Lekan
Publication venue
Publication date: 27/12/2022
Field of study

In many sequential decision-making tasks, the agent is not able to model the full complexity of the world, which consists of multitudes of relevant and irrelevant information. For example, a person walking along a city street who tries to model all aspects of the world would quickly be overwhelmed by a multitude of shops, cars, and people moving in and out of view, each following their own complex and inscrutable dynamics. Is it possible to turn the agent's firehose of sensory information into a minimal latent state that is both necessary and sufficient for an agent to successfully act in the world? We formulate this question concretely, and propose the Agent Control-Endogenous State Discovery algorithm (AC-State), which has theoretical guarantees and is practically demonstrated to discover the minimal control-endogenous latent state which contains all of the information necessary for controlling the agent, while fully discarding all irrelevant information. This algorithm consists of a multi-step inverse model (predicting actions from distant observations) with an information bottleneck. AC-State enables localization, exploration, and navigation without reward or demonstrations. We demonstrate the discovery of the control-endogenous latent state in three domains: localizing a robot arm with distractions (e.g., changing lighting conditions and background), exploring a maze alongside other agents, and navigating in the Matterport house simulator.Comment: Project Website: https://controllable-latent-state.github.io

arXiv.org e-Print Archive

Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient

Author: Bagnell J. Andrew
Krishnamurthy Akshay
Sekhari Ayush
Song Yuda
Sun Wen
Zhou Yifei
Publication venue
Publication date: 11/03/2023
Field of study

We consider a hybrid reinforcement learning setting (Hybrid RL), in which an agent has access to an offline dataset and the ability to collect experience via real-world online interaction. The framework mitigates the challenges that arise in both pure offline and online RL settings, allowing for the design of simple and highly effective algorithms, in both theory and practice. We demonstrate these advantages by adapting the classical Q learning/iteration algorithm to the hybrid setting, which we call Hybrid Q-Learning or Hy-Q. In our theoretical results, we prove that the algorithm is both computationally and statistically efficient whenever the offline dataset supports a high-quality policy and the environment has bounded bilinear rank. Notably, we require no assumptions on the coverage provided by the initial distribution, in contrast with guarantees for policy gradient/iteration methods. In our experimental results, we show that Hy-Q with neural network function approximation outperforms state-of-the-art online, offline, and hybrid RL baselines on challenging benchmarks, including Montezuma's Revenge.Comment: 42 pages, 6 figures. Published at ICLR 2023. Code available at https://github.com/yudasong/Hy

arXiv.org e-Print Archive