Benchmarking Uncertainty Estimates with Deep Reinforcement Learning for Dialogue Policy Optimisation

Abstract

In statistical dialogue management, the dialogue manager learns a policy that maps a belief state to an action for the system to perform. Efficient exploration is key to successful policy optimisation. Current deep reinforcement learning methods are very promising but rely on 系-greedy exploration, thus subjecting the user to a random choice of action during learning. Alternative approaches such as Gaussian Process SARSA (GP-SARSA) estimate uncertainties and sample actions leading to better user experience, but on the expense of a greater computational complexity. This paper examines approaches to extract uncertainty estimates from deep Q-networks (DQN) in the context of dialogue management. We perform thorough analysis of Bayes-By-Backpropagation DQN (BBQN). In addition we examine dropout, its concrete variation, bootstrapped ensemble and a-divergences as other means to extract uncertainty estimates from DQN. We find that BBQN achieves faster convergence to an optimal policy than any other method, and reaches performance comparable to the state of the art, but without the high computational complexity of GP-SARSA

    Similar works