Deep Reinforcement Learning with Weighted Q-Learning

Alippi, Cesare; Cini, Andrea; D'Eramo, Carlo; Peters, Jan

Deep Reinforcement Learning with Weighted Q-Learning

Authors: Cesare Alippi
Andrea Cini
Carlo D'Eramo
Jan Peters
Publication date: 30 March 2020
Publisher

Abstract

Overestimation of the maximum action-value is a well-known problem that hinders Q-Learning performance, leading to suboptimal policies and unstable learning. Among several Q-Learning variants proposed to address this issue, Weighted Q-Learning (WQL) effectively reduces the bias and shows remarkable results in stochastic environments. WQL uses a weighted sum of the estimated action-values, where the weights correspond to the probability of each action-value being the maximum; however, the computation of these probabilities is only practical in the tabular settings. In this work, we provide the methodological advances to benefit from the WQL properties in Deep Reinforcement Learning (DRL), by using neural networks with Dropout Variational Inference as an effective approximation of deep Gaussian processes. In particular, we adopt the Concrete Dropout variant to obtain calibrated estimates of epistemic uncertainty in DRL. We show that model uncertainty in DRL can be useful not only for action selection, but also action evaluation. We analyze how the novel Weighted Deep Q-Learning algorithm reduces the bias w.r.t. relevant baselines and provide empirical evidence of its advantages on several representative benchmarks.Comment: Corrected typo

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2003.09280

Last time updated on 12/10/2020