69 research outputs found
Pretraining in Deep Reinforcement Learning: A Survey
The past few years have seen rapid progress in combining reinforcement
learning (RL) with deep learning. Various breakthroughs ranging from games to
robotics have spurred the interest in designing sophisticated RL algorithms and
systems. However, the prevailing workflow in RL is to learn tabula rasa, which
may incur computational inefficiency. This precludes continuous deployment of
RL algorithms and potentially excludes researchers without large-scale
computing resources. In many other areas of machine learning, the pretraining
paradigm has shown to be effective in acquiring transferable knowledge, which
can be utilized for a variety of downstream tasks. Recently, we saw a surge of
interest in Pretraining for Deep RL with promising results. However, much of
the research has been based on different experimental settings. Due to the
nature of RL, pretraining in this field is faced with unique challenges and
hence requires new design principles. In this survey, we seek to systematically
review existing works in pretraining for deep reinforcement learning, provide a
taxonomy of these methods, discuss each sub-field, and bring attention to open
problems and future directions
Guaranteed Discovery of Control-Endogenous Latent States with Multi-Step Inverse Models
In many sequential decision-making tasks, the agent is not able to model the
full complexity of the world, which consists of multitudes of relevant and
irrelevant information. For example, a person walking along a city street who
tries to model all aspects of the world would quickly be overwhelmed by a
multitude of shops, cars, and people moving in and out of view, each following
their own complex and inscrutable dynamics. Is it possible to turn the agent's
firehose of sensory information into a minimal latent state that is both
necessary and sufficient for an agent to successfully act in the world? We
formulate this question concretely, and propose the Agent Control-Endogenous
State Discovery algorithm (AC-State), which has theoretical guarantees and is
practically demonstrated to discover the minimal control-endogenous latent
state which contains all of the information necessary for controlling the
agent, while fully discarding all irrelevant information. This algorithm
consists of a multi-step inverse model (predicting actions from distant
observations) with an information bottleneck. AC-State enables localization,
exploration, and navigation without reward or demonstrations. We demonstrate
the discovery of the control-endogenous latent state in three domains:
localizing a robot arm with distractions (e.g., changing lighting conditions
and background), exploring a maze alongside other agents, and navigating in the
Matterport house simulator.Comment: Project Website: https://controllable-latent-state.github.io
Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient
We consider a hybrid reinforcement learning setting (Hybrid RL), in which an
agent has access to an offline dataset and the ability to collect experience
via real-world online interaction. The framework mitigates the challenges that
arise in both pure offline and online RL settings, allowing for the design of
simple and highly effective algorithms, in both theory and practice. We
demonstrate these advantages by adapting the classical Q learning/iteration
algorithm to the hybrid setting, which we call Hybrid Q-Learning or Hy-Q. In
our theoretical results, we prove that the algorithm is both computationally
and statistically efficient whenever the offline dataset supports a
high-quality policy and the environment has bounded bilinear rank. Notably, we
require no assumptions on the coverage provided by the initial distribution, in
contrast with guarantees for policy gradient/iteration methods. In our
experimental results, we show that Hy-Q with neural network function
approximation outperforms state-of-the-art online, offline, and hybrid RL
baselines on challenging benchmarks, including Montezuma's Revenge.Comment: 42 pages, 6 figures. Published at ICLR 2023. Code available at
https://github.com/yudasong/Hy
- …