2,192 research outputs found
BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems
We present a new algorithm that significantly improves the efficiency of
exploration for deep Q-learning agents in dialogue systems. Our agents explore
via Thompson sampling, drawing Monte Carlo samples from a Bayes-by-Backprop
neural network. Our algorithm learns much faster than common exploration
strategies such as -greedy, Boltzmann, bootstrapping, and
intrinsic-reward-based ones. Additionally, we show that spiking the replay
buffer with experiences from just a few successful episodes can make Q-learning
feasible when it might otherwise fail.Comment: 13 pages, 9 figure
- …