Training Dialogue Systems With Human Advice

Abstract

International audienceOne major drawback of Reinforcement Learning (RL) Spoken Dialogue Systems is that they inherit from the general explorationrequirements of RL which makes them hard to deploy from an industry perspective. On the other hand, industrial systems rely onhuman expertise and hand written rules so as to avoid irrelevant behavior to happen and maintain acceptable experience from theuser point of view. In this paper, we attempt to bridge the gap between those two worlds by providing an easy way to incorporate allkinds of human expertise in the training phase of a Reinforcement Learning Dialogue System. Our approach, based on the TAMERframework, enables safe and efficient policy learning by combining the traditional Reinforcement Learning reward signal with anadditional reward, encoding expert advice. Experimental results show that our method leads to substantial improvements over moretraditional Reinforcement Learning methods

    Similar works