Search CORE

19 research outputs found

Approximation Benefits of Policy Gradient Methods with Aggregated States

Author: Russo Daniel
Publication venue
Publication date: 07/01/2021
Field of study

Folklore suggests that policy gradient can be more robust to misspecification than its relative, approximate policy iteration. This paper studies the case of state-aggregation, where the state space is partitioned and either the policy or value function approximation is held constant over partitions. This paper shows a policy gradient method converges to a policy whose regret per-period is bounded by

\epsilon

, the largest difference between two elements of the state-action value function belonging to a common partition. With the same representation, both approximate policy iteration and approximate value iteration can produce policies whose per-period regret scales as

\epsilon/(1-\gamma)

, where

\gamma

is a discount factor. Theoretical results synthesize recent analysis of policy gradient methods with insights of Van Roy (2006) into the critical role of state-relevance weights in approximate dynamic programming

arXiv.org e-Print Archive

Distributed Reinforcement Learning in Multi-Agent Networked Systems

Author: Huang Longbo
Lin Yiheng
Qu Guannan
Wierman Adam
Publication venue
Publication date: 11/06/2020
Field of study

We study distributed reinforcement learning (RL) for a network of agents. The objective is to find localized policies that maximize the (discounted) global reward. In general, scalability is a challenge in this setting because the size of the global state/action space can be exponential in the number of agents. Scalable algorithms are only known in cases where dependencies are local, e.g., between neighbors. In this work, we propose a Scalable Actor Critic framework that applies in settings where the dependencies are non-local and provide a finite-time error bound that shows how the convergence rate depends on the depth of the dependencies in the network. Additionally, as a byproduct of our analysis, we obtain novel finite-time convergence results for a general stochastic approximation scheme and for temporal difference learning with state aggregation that apply beyond the setting of RL in networked systems

Distributed Reinforcement Learning in Multi-Agent Networked Systems

Author: Huang Longbo
Lin Yiheng
Qu Guannan
Wierman Adam
Publication venue
Publication date: 11/06/2020
Field of study

Caltech Authors