Search CORE

8,417 research outputs found

Deep reinforcement learning of dialogue policies with less weight updates

Author: Cuayahuitl Heriberto
Yu Seunghak
Publication venue
Publication date: 20/08/2017
Field of study

Deep reinforcement learning dialogue systems are attractive because they can jointly learn their feature representations and policies without manual feature engineering. But its application is challenging due to slow learning. We propose a two-stage method for accelerating the induction of single or multi-domain dialogue policies. While the first stage reduces the amount of weight updates over time, the second stage uses very limited minibatches (of as much as two learning experiences) sampled from experience replay memories. The former frequently updates the weights of the neural nets at early stages of training, and decreases the amount of updates as training progresses by performing updates during exploration and by skipping updates during exploitation. The learning process is thus accelerated through less weight updates in both stages. An empirical evaluation in three domains (restaurants, hotels and tv guide) confirms that the proposed method trains policies 5 times faster than a baseline without the proposed method. Our findings are useful for training larger-scale neural-based spoken dialogue systems

University of Lincoln Institutional Repository

Crossref

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

Author: Ahmed Faisal
Deng Li
Gao Jianfeng
Li Lihong
Li Xiujun
Lipton Zachary C.
Publication venue
Publication date: 19/11/2017
Field of study

We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems. Our agents explore via Thompson sampling, drawing Monte Carlo samples from a Bayes-by-Backprop neural network. Our algorithm learns much faster than common exploration strategies such as

\epsilon

-greedy, Boltzmann, bootstrapping, and intrinsic-reward-based ones. Additionally, we show that spiking the replay buffer with experiences from just a few successful episodes can make Q-learning feasible when it might otherwise fail.Comment: 13 pages, 9 figure

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications