1,542 research outputs found
Finite-Time Analysis of Temporal Difference Learning: Discrete-Time Linear System Perspective
TD-learning is a fundamental algorithm in the field of reinforcement learning
(RL), that is employed to evaluate a given policy by estimating the
corresponding value function for a Markov decision process. While significant
progress has been made in the theoretical analysis of TD-learning, recent
research has uncovered guarantees concerning its statistical efficiency by
developing finite-time error bounds. This paper aims to contribute to the
existing body of knowledge by presenting a novel finite-time analysis of
tabular temporal difference (TD) learning, which makes direct and effective use
of discrete-time stochastic linear system models and leverages Schur matrix
properties. The proposed analysis can cover both on-policy and off-policy
settings in a unified manner. By adopting this approach, we hope to offer new
and straightforward templates that not only shed further light on the analysis
of TD-learning and related RL algorithms but also provide valuable insights for
future research in this domain.Comment: arXiv admin note: text overlap with arXiv:2112.1441
A Finite Time Analysis of Two Time-Scale Actor Critic Methods
Actor-critic (AC) methods have exhibited great empirical success compared
with other reinforcement learning algorithms, where the actor uses the policy
gradient to improve the learning policy and the critic uses temporal difference
learning to estimate the policy gradient. Under the two time-scale learning
rate schedule, the asymptotic convergence of AC has been well studied in the
literature. However, the non-asymptotic convergence and finite sample
complexity of actor-critic methods are largely open. In this work, we provide a
non-asymptotic analysis for two time-scale actor-critic methods under
non-i.i.d. setting. We prove that the actor-critic method is guaranteed to find
a first-order stationary point (i.e., ) of the non-concave performance function
, with sample
complexity. To the best of our knowledge, this is the first work providing
finite-time analysis and sample complexity bound for two time-scale
actor-critic methods.Comment: 45 page
Search, navigation and foraging: an optimal decision-making perspective
Behavior in its general form can be defined as a mapping between sensory inputs and a pattern of motor actions that are used to achieve a goal. Reinforcement learning in the last years emerged as a general framework to analyze behavior in its general definition. In this thesis exploiting the techniques of reinforcement learning we study several phenomena that can be classified as search, navigation and foraging behaviors.
Regarding the search aspect we analyze random walks forced to reach a target in a confined region of the space. In this case we can solve analytically the problem that
allows to find a very efficient way to generate such walks. The navigation problem is inspired by olfactory navigation in homing pigeons. In this case we propose an algorithm to navigate a noisy environment relying only on local signals. The foraging instead is analyzed starting from the observation that fossil traces show the evolution of foraging strategies towards highly compact and self-avoiding trajectories. We show how this optimal behavior can emerge in the reinforcement learning framework
- …