Reinforcement learning methods are increasingly used to optimise dialogue
policies from experience. Most current techniques are model-free: they directly
estimate the utility of various actions, without explicit model of the
interaction dynamics. In this paper, we investigate an alternative strategy
grounded in model-based Bayesian reinforcement learning. Bayesian inference is
used to maintain a posterior distribution over the model parameters, reflecting
the model uncertainty. This parameter distribution is gradually refined as more
data is collected and simultaneously used to plan the agent's actions. Within
this learning framework, we carried out experiments with two alternative
formalisations of the transition model, one encoded with standard multinomial
distributions, and one structured with probabilistic rules. We demonstrate the
potential of our approach with empirical results on a user simulator
constructed from Wizard-of-Oz data in a human-robot interaction scenario. The
results illustrate in particular the benefits of capturing prior domain
knowledge with high-level rules