78 research outputs found
Contextual Out-of-Domain Utterance Handling With Counterfeit Data Augmentation
Neural dialog models often lack robustness to anomalous user input and
produce inappropriate responses which leads to frustrating user experience.
Although there are a set of prior approaches to out-of-domain (OOD) utterance
detection, they share a few restrictions: they rely on OOD data or multiple
sub-domains, and their OOD detection is context-independent which leads to
suboptimal performance in a dialog. The goal of this paper is to propose a
novel OOD detection method that does not require OOD data by utilizing
counterfeit OOD turns in the context of a dialog. For the sake of fostering
further research, we also release new dialog datasets which are 3 publicly
available dialog corpora augmented with OOD turns in a controllable way. Our
method outperforms state-of-the-art dialog models equipped with a conventional
OOD detection mechanism by a large margin in the presence of OOD utterances.Comment: ICASSP 201
Recurrent Polynomial Network for Dialogue State Tracking
Dialogue state tracking (DST) is a process to estimate the distribution of the dialogue states as a dialogue progresses. Recent studies on constrained Markov Bayesian polynomial (CMBP) framework take the first step towards bridging the gap between rule-based and statistical approaches for DST. In this paper, the gap is further bridged by a novel framework -- recurrent polynomial network (RPN). RPN's unique structure enables the framework to have all the advantages of CMBP including efficiency, portability and interpretability. Additionally, RPN achieves more properties of statistical approaches than CMBP. RPN was evaluated on the data corpora of the second and the third Dialog State Tracking Challenge (DSTC-2/3). Experiments showed that RPN can significantly outperform both traditional rule-based approaches and statistical approaches with similar feature set. Compared with the state-of-the-art statistical DST approaches with a lot richer features, RPN is also competitive
User Satisfaction Reward Estimation Across Domains: Domain-independent Dialogue Policy Learning
Learning suitable and well-performing dialogue behaviour in statistical spoken dialogue systems has been in the focus of research for many years. While most work that is based on reinforcement learning employs an objective measure like task success for modelling the reward signal, we propose to use a reward signal based on user satisfaction. We propose a novel estimator and show that it outperforms all previous estimators while learning temporal dependencies implicitly. We show in simulated experiments that a live user satisfaction estimation model may be applied resulting in higher estimated satisfaction whilst achieving similar success rates. Moreover, we show that a satisfaction estimation model trained on one domain may be applied in many other domains that cover a similar task. We verify our findings by employing the model to one of the domains for learning a policy from real users and compare its performance to policies using user satisfaction and task success acquired directly from the users as reward
- …