Search CORE

32 research outputs found

Is the Bellman residual a bad proxy?

Author: Geist Matthieu
Pietquin Olivier
Piot Bilal
Publication venue
Publication date: 04/12/2017
Field of study

This paper aims at theoretically and empirically comparing two standard optimization criteria for Reinforcement Learning: i) maximization of the mean value and ii) minimization of the Bellman residual. For that purpose, we place ourselves in the framework of policy search algorithms, that are usually designed to maximize the mean value, and derive a method that minimizes the residual

\|T_* v_\pi - v_\pi\|_{1,\nu}

over policies. A theoretical analysis shows how good this proxy is to policy optimization, and notably that it is better than its value-based counterpart. We also propose experiments on randomly generated generic Markov decision processes, specifically designed for studying the influence of the involved concentrability coefficient. They show that the Bellman residual is generally a bad proxy to policy optimization and that directly maximizing the mean value is much better, despite the current lack of deep theoretical analysis. This might seem obvious, as directly addressing the problem of interest is usually better, but given the prevalence of (projected) Bellman residual minimization in value-based reinforcement learning, we believe that this question is worth to be considered.Comment: Final NIPS 2017 version (title, among other things, changed

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-INSU

HAL Descartes

Hal-Diderot

Bootstrapped Representations in Reinforcement Learning

Author: Agarwal Rishabh
Bellemare Marc G.
Dabney Will
Harutyunyan Anna
Lan Charline Le
Rowland Mark
Tu Stephen
Publication venue
Publication date: 16/06/2023
Field of study

In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of deep learning algorithms is to automatically construct features well-tuned for the task they try to solve, such a representation might not emerge from end-to-end training of deep RL agents. To mitigate this issue, auxiliary objectives are often incorporated into the learning process and help shape the learnt state representation. Bootstrapping methods are today's method of choice to make these additional predictions. Yet, it is unclear which features these algorithms capture and how they relate to those from other auxiliary-task-based approaches. In this paper, we address this gap and provide a theoretical characterization of the state representation learnt by temporal difference learning (Sutton, 1988). Surprisingly, we find that this representation differs from the features learned by Monte Carlo and residual gradient algorithms for most transition structures of the environment in the policy evaluation setting. We describe the efficacy of these representations for policy evaluation, and use our theoretical analysis to design new auxiliary learning rules. We complement our theoretical results with an empirical comparison of these learning rules for different cumulant functions on classic domains such as the four-room domain (Sutton et al, 1999) and Mountain Car (Moore, 1990).Comment: ICML 202

arXiv.org e-Print Archive

l1-penalized projected Bellman residual

Author: Geist Matthieu
Scherrer Bruno
Publication venue: HAL CCSD
Publication date: 09/09/2011
Field of study

International audienceWe consider the task of feature selection for value function approximation in reinforcement learning. A promising approach consists in combining the Least-Squares Temporal Difference (LSTD) algorithm with

\ell_1

-regularization, which has proven to be effective in the supervised learning community. This has been done recently whit the LARS-TD algorithm, which replaces the projection operator of LSTD with an

\ell_1

-penalized projection and solves the corresponding fixed-point problem. However, this approach is not guaranteed to be correct in the general off-policy setting. We take a different route by adding an

\ell_1

-penalty term to the projected Bellman residual, which requires weaker assumptions while offering a comparable performance. However, this comes at the cost of a higher computational complexity if only a part of the regularization path is computed. Nevertheless, our approach ends up to a supervised learning problem, which let envision easy extensions to other penalties

INRIA a CCSD electronic archive server

Finite-Sample Analysis of Bellman Residual Minimization

Author: Alessandro Lazaric
Masashi Sugiyama
Mohammad Ghavamzadeh
Odalric-ambrym Maillard
Qiang Yang
Publication venue: HAL CCSD
Publication date: 01/01/2010
Field of study

International audienceWe consider the Bellman residual minimization approach for solving discounted Markov decision problems, where we assume that a generative model of the dynamics and rewards is available. At each policy iteration step, an approximation of the value function for the current policy is obtained by minimizing an empirical Bellman residual defined on a set of n states drawn i.i.d. from a distribution, the immediate rewards, and the next states sampled from the model. Our main result is a generalization bound for the Bellman residual in linear approximation spaces. In particular, we prove that the empirical Bellman residual approaches the true (quadratic) Bellman residual with a rate of order O(1/sqrt((n)). This result implies that minimizing the empirical residual is indeed a sound approach for the minimization of the true Bellman residual which guarantees a good approximation of the value function for each policy. Finally, we derive performance bounds for the resulting approximate policy iteration algorithm in terms of the number of samples n and a measure of how well the function space is able to approximate the sequence of value functions.

HAL - Lille 3

CiteSeerX

INRIA a CCSD electronic archive server

Is the Bellman residual a bad proxy?

Author: Geist Matthieu
Pietquin Olivier
Piot Bilal
Publication venue: HAL CCSD
Publication date: 04/12/2017
Field of study

International audienceThis paper aims at theoretically and empirically comparing two standard optimization criteria for Reinforcement Learning: i) maximization of the mean value and ii) minimization of the Bellman residual. For that purpose, we place ourselves in the framework of policy search algorithms, that are usually designed to maximize the mean value, and derive a method that minimizes the residual T * v π − v π 1,ν over policies. A theoretical analysis shows how good this proxy is to policy optimization , and notably that it is better than its value-based counterpart. We also propose experiments on randomly generated generic Markov decision processes, specifically designed for studying the influence of the involved concentrability coefficient. They show that the Bellman residual is generally a bad proxy to policy optimization and that directly maximizing the mean value is much better, despite the current lack of deep theoretical analysis. This might seem obvious, as directly addressing the problem of interest is usually better, but given the prevalence of (projected) Bellman residual minimization in value-based reinforcement learning, we believe that this question is worth to be considered

INRIA a CCSD electronic archive server

Extended Abstracts presented at the 25th International Symposium on Mathematical Theory of Networks and Systems MTNS 2022 : held 12-16 September 2022 in Bayreuth, Germany

Author: make_name_string expected hash reference
Publication venue
Publication date: 01/01/2022
Field of study

EPub Bayreuth

Advances in Reinforcement Learning

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Reinforcement Learning (RL) is a very dynamic area in terms of theory and application. This book brings together many different aspects of the current research on several fields associated to RL which has been growing rapidly, producing a wide variety of learning algorithms for different applications. Based on 24 Chapters, it covers a very broad variety of topics in RL and their application in autonomous systems. A set of chapters in this book provide a general overview of RL while other chapters focus mostly on the applications of RL paradigms: Game Theory, Multi-Agent Theory, Robotic, Networking Technologies, Vehicular Navigation, Medicine and Industrial Logistic

Directory of Open Access Books (DOAB)

Sparse Classification - Methods & Applications

Author: Einarsson Gudmundur
Publication venue: DTU Compute
Publication date: 01/01/2018
Field of study

Online Research Database In Technology

A Data Science Approach to Holistically Investigate Historical Caster Data for Predictive Maintenance

Author: Peters Rebecca
Publication venue
Publication date: 01/01/2023
Field of study

University of South Wales Research Explorer