2,036 research outputs found
Scalable methods for computing state similarity in deterministic Markov Decision Processes
We present new algorithms for computing and approximating bisimulation
metrics in Markov Decision Processes (MDPs). Bisimulation metrics are an
elegant formalism that capture behavioral equivalence between states and
provide strong theoretical guarantees on differences in optimal behaviour.
Unfortunately, their computation is expensive and requires a tabular
representation of the states, which has thus far rendered them impractical for
large problems. In this paper we present a new version of the metric that is
tied to a behavior policy in an MDP, along with an analysis of its theoretical
properties. We then present two new algorithms for approximating bisimulation
metrics in large, deterministic MDPs. The first does so via sampling and is
guaranteed to converge to the true metric. The second is a differentiable loss
which allows us to learn an approximation even for continuous state MDPs, which
prior to this work had not been possible.Comment: To appear in Proceedings of the Thirty-Fourth AAAI Conference on
Artificial Intelligence (AAAI-20
Small batch deep reinforcement learning
In value-based deep reinforcement learning with replay memories, the batch
size parameter specifies how many transitions to sample for each gradient
update. Although critical to the learning process, this value is typically not
adjusted when proposing new algorithms. In this work we present a broad
empirical study that suggests {\em reducing} the batch size can result in a
number of significant performance gains; this is surprising, as the general
tendency when training neural networks is towards larger batch sizes for
improved performance. We complement our experimental findings with a set of
empirical analyses towards better understanding this phenomenon.Comment: Published at NeurIPS 202
- …