482 research outputs found
Bayesian Reinforcement Learning via Deep, Sparse Sampling
We address the problem of Bayesian reinforcement learning using efficient
model-based online planning. We propose an optimism-free Bayes-adaptive
algorithm to induce deeper and sparser exploration with a theoretical bound on
its performance relative to the Bayes optimal policy, with a lower
computational complexity. The main novelty is the use of a candidate policy
generator, to generate long-term options in the planning tree (over beliefs),
which allows us to create much sparser and deeper trees. Experimental results
on different environments show that in comparison to the state-of-the-art, our
algorithm is both computationally more efficient, and obtains significantly
higher reward in discrete environments.Comment: Published in AISTATS 202
Examining average and discounted reward optimality criteria in reinforcement learning
In reinforcement learning (RL), the goal is to obtain an optimal policy, for
which the optimality criterion is fundamentally important. Two major optimality
criteria are average and discounted rewards, where the later is typically
considered as an approximation to the former. While the discounted reward is
more popular, it is problematic to apply in environments that have no natural
notion of discounting. This motivates us to revisit a) the progression of
optimality criteria in dynamic programming, b) justification for and
complication of an artificial discount factor, and c) benefits of directly
maximizing the average reward. Our contributions include a thorough examination
of the relationship between average and discounted rewards, as well as a
discussion of their pros and cons in RL. We emphasize that average-reward RL
methods possess the ingredient and mechanism for developing the general
discounting-free optimality criterion (Veinott, 1969) in RL.Comment: 14 pages, 3 figures, 10-page main conten
Towards Continual Reinforcement Learning: A Review and Perspectives
In this article, we aim to provide a literature review of different
formulations and approaches to continual reinforcement learning (RL), also
known as lifelong or non-stationary RL. We begin by discussing our perspective
on why RL is a natural fit for studying continual learning. We then provide a
taxonomy of different continual RL formulations and mathematically characterize
the non-stationary dynamics of each setting. We go on to discuss evaluation of
continual RL agents, providing an overview of benchmarks used in the literature
and important metrics for understanding agent performance. Finally, we
highlight open problems and challenges in bridging the gap between the current
state of continual RL and findings in neuroscience. While still in its early
days, the study of continual RL has the promise to develop better incremental
reinforcement learners that can function in increasingly realistic applications
where non-stationarity plays a vital role. These include applications such as
those in the fields of healthcare, education, logistics, and robotics.Comment: Preprint, 52 pages, 8 figure
When Do Transformers Shine in RL? Decoupling Memory from Credit Assignment
Reinforcement learning (RL) algorithms face two distinct challenges: learning
effective representations of past and present observations, and determining how
actions influence future returns. Both challenges involve modeling long-term
dependencies. The transformer architecture has been very successful to solve
problems that involve long-term dependencies, including in the RL domain.
However, the underlying reason for the strong performance of Transformer-based
RL methods remains unclear: is it because they learn effective memory, or
because they perform effective credit assignment? After introducing formal
definitions of memory length and credit assignment length, we design simple
configurable tasks to measure these distinct quantities. Our empirical results
reveal that Transformers can enhance the memory capacity of RL algorithms,
scaling up to tasks that require memorizing observations steps ago.
However, Transformers do not improve long-term credit assignment. In summary,
our results provide an explanation for the success of Transformers in RL, while
also highlighting an important area for future research and benchmark design
- …