Search CORE

2,036 research outputs found

Scalable methods for computing state similarity in deterministic Markov Decision Processes

Author: Castro Pablo Samuel
Publication venue
Publication date: 21/11/2019
Field of study

We present new algorithms for computing and approximating bisimulation metrics in Markov Decision Processes (MDPs). Bisimulation metrics are an elegant formalism that capture behavioral equivalence between states and provide strong theoretical guarantees on differences in optimal behaviour. Unfortunately, their computation is expensive and requires a tabular representation of the states, which has thus far rendered them impractical for large problems. In this paper we present a new version of the metric that is tied to a behavior policy in an MDP, along with an analysis of its theoretical properties. We then present two new algorithms for approximating bisimulation metrics in large, deterministic MDPs. The first does so via sampling and is guaranteed to converge to the true metric. The second is a differentiable loss which allows us to learn an approximation even for continuous state MDPs, which prior to this work had not been possible.Comment: To appear in Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Small batch deep reinforcement learning

Author: Bellemare Marc G.
Castro Pablo Samuel
Obando-Ceron Johan
Publication venue
Publication date: 05/10/2023
Field of study

In value-based deep reinforcement learning with replay memories, the batch size parameter specifies how many transitions to sample for each gradient update. Although critical to the learning process, this value is typically not adjusted when proposing new algorithms. In this work we present a broad empirical study that suggests {\em reducing} the batch size can result in a number of significant performance gains; this is surprising, as the general tendency when training neural networks is towards larger batch sizes for improved performance. We complement our experimental findings with a set of empirical analyses towards better understanding this phenomenon.Comment: Published at NeurIPS 202

arXiv.org e-Print Archive