19 research outputs found
Approximation Benefits of Policy Gradient Methods with Aggregated States
Folklore suggests that policy gradient can be more robust to misspecification
than its relative, approximate policy iteration. This paper studies the case of
state-aggregation, where the state space is partitioned and either the policy
or value function approximation is held constant over partitions. This paper
shows a policy gradient method converges to a policy whose regret per-period is
bounded by , the largest difference between two elements of the
state-action value function belonging to a common partition. With the same
representation, both approximate policy iteration and approximate value
iteration can produce policies whose per-period regret scales as
, where is a discount factor. Theoretical results
synthesize recent analysis of policy gradient methods with insights of Van Roy
(2006) into the critical role of state-relevance weights in approximate dynamic
programming
Distributed Reinforcement Learning in Multi-Agent Networked Systems
We study distributed reinforcement learning (RL) for a network of agents. The objective is to find localized policies that maximize the (discounted) global reward. In general, scalability is a challenge in this setting because the size of the global state/action space can be exponential in the number of agents. Scalable algorithms are only known in cases where dependencies are local, e.g., between neighbors. In this work, we propose a Scalable Actor Critic framework that applies in settings where the dependencies are non-local and provide a finite-time error bound that shows how the convergence rate depends on the depth of the dependencies in the network. Additionally, as a byproduct of our analysis, we obtain novel finite-time convergence results for a general stochastic approximation scheme and for temporal difference learning with state aggregation that apply beyond the setting of RL in networked systems
Distributed Reinforcement Learning in Multi-Agent Networked Systems
We study distributed reinforcement learning (RL) for a network of agents. The objective is to find localized policies that maximize the (discounted) global reward. In general, scalability is a challenge in this setting because the size of the global state/action space can be exponential in the number of agents. Scalable algorithms are only known in cases where dependencies are local, e.g., between neighbors. In this work, we propose a Scalable Actor Critic framework that applies in settings where the dependencies are non-local and provide a finite-time error bound that shows how the convergence rate depends on the depth of the dependencies in the network. Additionally, as a byproduct of our analysis, we obtain novel finite-time convergence results for a general stochastic approximation scheme and for temporal difference learning with state aggregation that apply beyond the setting of RL in networked systems