Search CORE

1,083 research outputs found

Sample Efficient On-line Learning of Optimal Dialogue Policies with Kalman Temporal Differences

Author: Chandramohan Senthilkumar
Geist Matthieu
Pietquin Olivier
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

International audienceDesigning dialog policies for voice-enabled interfaces is a tailoring job that is most often left to natural language processing experts. This job is generally redone for every new dialog task because cross-domain transfer is not possible. For this reason, machine learning methods for dialog policy optimization have been investigated during the last 15 years. Especially, reinforcement learning (RL) is now part of the state of the art in this domain. Standard RL methods require to test more or less random changes in the policy on users to assess them as improvements or degradations. This is called on policy learning. Nevertheless, it can result in system behaviors that are not acceptable by users. Learning algorithms should ideally infer an optimal strategy by observing interactions generated by a non-optimal but acceptable strategy, that is learning off-policy. In this contribution, a sample-efficient, online and off-policy reinforcement learning algorithm is proposed to learn an optimal policy from few hundreds of dialogues generated with a very simple handcrafted policy

HAL-CentraleSupelec

CiteSeerX

HAL - Université de Franche-Comté

HAL-Rennes 1

Uncertainty management for on-line optimisation of a POMDP-based large-scale spoken dialogue system

Author: Chandramohan Senthilkumar
Daubigney Lucie
Gašić Milica
Geist Matthieu
Pietquin Olivier
Young Steve
Publication venue: HAL CCSD
Publication date: 27/08/2011
Field of study

International audienceThe optimization of dialogue policies using reinforcement learning (RL) is now an accepted part of the state of the art in spoken dialogue systems (SDS). Yet, it is still the case that the commonly used training algorithms for SDS require a large number of dialogues and hence most systems still rely on artificial data generated by a user simulator. Optimization is therefore performed off-line before releasing the system to real users. Gaussian Processes (GP) for RL have recently been applied to dialogue systems. One advantage of GP is that they compute an explicit measure of uncertainty in the value function estimates computed during learning. In this paper, a class of novel learning strategies is described which use uncertainty to control exploration on-line. Comparisons between several exploration schemes show that significant improvements to learning speed can be obtained and that rapid and safe online optimisation is possible, even on a complex task

HAL-CentraleSupelec

HAL-Rennes 1

Apprentissage off-policy appliqué à un système de dialogue basé sur les PDMPO

Author: Daubigney Lucie
Geist Matthieu
Pietquin Olivier
Publication venue: HAL CCSD
Publication date: 01/01/2012
Field of study

Session "Articles"National audienceL'apprentissage par renforcement (AR) fait maintenant partie de l'état de l'art dans le domaine de l'optimisation de systèmes de dialogues vocaux. La plupart des méthodes appliquées aux systèmes de dialogue basées sur l'AR, comme par exemple celles qui utilisent des processus gaussiens, requièrent de tester des changements plus ou moins aléatoires dans la politique. Cette manière de procéder est appelée apprentissage " on-policy ". Néanmoins, celle-ci peut induire des comportements de la part du système incohérents aux yeux de l'utilisateur. Les algorithmes devraient idéalement trouver la politique optimale d'après l'observation d'interactions générées par une politique sous-optimale mais proposant un comportement cohérent a l'utilisateur : c'est l'apprentissage " off-policy ". Dans cette contribution, un algorithme efficace sur les échantillons permettant l'apprentissage off-policy et en ligne de la politique optimale est proposé. Cet algorithme combiné, à une représentation compacte, non-linéaire de la fonction de valeur (un perceptron multi-couche) permet de gérer des systèmes à grande échell

HAL-CentraleSupelec

HAL - Université de Franche-Comté

INRIA a CCSD electronic archive server

HAL-Rennes 1

Machine Learning for Interactive Systems: Challenges and Future Trends

Author: Lopes Manuel
Pietquin Olivier
Publication venue: HAL CCSD
Publication date: 30/06/2014
Field of study

National audienceMachine learning has been introduced more than 40 years ago in interactive systems through speech recognition or computer vision. Since that, machine learning gained in interest in the scientific community involved in human- machine interaction and raised in the abstraction scale. It moved from fundamental signal processing to language understanding and generation, emotion and mood recogni- tion and even dialogue management or robotics control. So far, existing machine learning techniques have often been considered as a solution to some problems raised by inter- active systems. Yet, interaction is also the source of new challenges for machine learning and offers new interesting practical but also theoretical problems to solve. In this paper, we address these challenges and describe why research in machine learning and interactive systems should converge in the future

HAL - Lille 3

INRIA a CCSD electronic archive server

Model-based Bayesian Reinforcement Learning for Dialogue Management

Author: Lison Pierre
Publication venue
Publication date: 01/01/2013
Field of study

Reinforcement learning methods are increasingly used to optimise dialogue policies from experience. Most current techniques are model-free: they directly estimate the utility of various actions, without explicit model of the interaction dynamics. In this paper, we investigate an alternative strategy grounded in model-based Bayesian reinforcement learning. Bayesian inference is used to maintain a posterior distribution over the model parameters, reflecting the model uncertainty. This parameter distribution is gradually refined as more data is collected and simultaneously used to plan the agent's actions. Within this learning framework, we carried out experiments with two alternative formalisations of the transition model, one encoded with standard multinomial distributions, and one structured with probabilistic rules. We demonstrate the potential of our approach with empirical results on a user simulator constructed from Wizard-of-Oz data in a human-robot interaction scenario. The results illustrate in particular the benefits of capturing prior domain knowledge with high-level rules

arXiv.org e-Print Archive

CiteSeerX

NORA - Norwegian Open Research Archives