309 research outputs found

    Relational Approach to Knowledge Engineering for POMDP-based Assistance Systems as a Translation of a Psychological Model

    Get PDF
    Assistive systems for persons with cognitive disabilities (e.g. dementia) are difficult to build due to the wide range of different approaches people can take to accomplishing the same task, and the significant uncertainties that arise from both the unpredictability of client's behaviours and from noise in sensor readings. Partially observable Markov decision process (POMDP) models have been used successfully as the reasoning engine behind such assistive systems for small multi-step tasks such as hand washing. POMDP models are a powerful, yet flexible framework for modelling assistance that can deal with uncertainty and utility. Unfortunately, POMDPs usually require a very labour intensive, manual procedure for their definition and construction. Our previous work has described a knowledge driven method for automatically generating POMDP activity recognition and context sensitive prompting systems for complex tasks. We call the resulting POMDP a SNAP (SyNdetic Assistance Process). The spreadsheet-like result of the analysis does not correspond to the POMDP model directly and the translation to a formal POMDP representation is required. To date, this translation had to be performed manually by a trained POMDP expert. In this paper, we formalise and automate this translation process using a probabilistic relational model (PRM) encoded in a relational database. We demonstrate the method by eliciting three assistance tasks from non-experts. We validate the resulting POMDP models using case-based simulations to show that they are reasonable for the domains. We also show a complete case study of a designer specifying one database, including an evaluation in a real-life experiment with a human actor

    Learning dialogue POMDP model components from expert dialogues

    Get PDF
    Un système de dialogue conversationnel doit aider les utilisateurs humains à atteindre leurs objectifs à travers des dialogues naturels et efficients. C'est une tache toutefois difficile car les langages naturels sont ambiguës et incertains, de plus le système de reconnaissance vocale (ASR) est bruité. À cela s'ajoute le fait que l'utilisateur humain peut changer son intention lors de l'interaction avec la machine. Dans ce contexte, l'application des processus décisionnels de Markov partiellement observables (POMDPs) au système de dialogue conversationnel nous a permis d'avoir un cadre formel pour représenter explicitement les incertitudes, et automatiser la politique d'optimisation. L'estimation des composantes du modelé d'un POMDP-dialogue constitue donc un défi important, car une telle estimation a un impact direct sur la politique d'optimisation du POMDP-dialogue. Cette thèse propose des méthodes d'apprentissage des composantes d'un POMDPdialogue basées sur des dialogues bruités et sans annotation. Pour cela, nous présentons des méthodes pour apprendre les intentions possibles des utilisateurs à partir des dialogues, en vue de les utiliser comme états du POMDP-dialogue, et l'apprendre un modèle du maximum de vraisemblance à partir des données, pour transition du POMDP. Car c'est crucial de réduire la taille d'état d'observation, nous proposons également deux modèles d'observation: le modelé mot-clé et le modelé intention. Dans les deux modèles, le nombre d'observations est réduit significativement tandis que le rendement reste élevé, particulièrement dans le modele d'observation intention. En plus de ces composantes du modèle, les POMDPs exigent également une fonction de récompense. Donc, nous proposons de nouveaux algorithmes pour l'apprentissage du modele de récompenses, un apprentissage qui est basé sur le renforcement inverse (IRL). En particulier, nous proposons POMDP-IRL-BT qui fonctionne sur les états de croyance disponibles dans les dialogues du corpus. L'algorithme apprend le modele de récompense par l'estimation du modele de transition de croyance, semblable aux modèles de transition des états dans un MDP (processus décisionnel de Markov). Finalement, nous appliquons les méthodes proposées à un domaine de la santé en vue d'apprendre un POMDP-dialogue et ce essentiellement à partir de dialogues réels, bruités, et sans annotations.Spoken dialogue systems should realize the user intentions and maintain a natural and efficient dialogue with users. This is however a difficult task as spoken language is naturally ambiguous and uncertain, and further the automatic speech recognition (ASR) output is noisy. In addition, the human user may change his intention during the interaction with the machine. To tackle this difficult task, the partially observable Markov decision process (POMDP) framework has been applied in dialogue systems as a formal framework to represent uncertainty explicitly while supporting automated policy solving. In this context, estimating the dialogue POMDP model components is a signifficant challenge as they have a direct impact on the optimized dialogue POMDP policy. This thesis proposes methods for learning dialogue POMDP model components using noisy and unannotated dialogues. Speciffically, we introduce techniques to learn the set of possible user intentions from dialogues, use them as the dialogue POMDP states, and learn a maximum likelihood POMDP transition model from data. Since it is crucial to reduce the observation state size, we then propose two observation models: the keyword model and the intention model. Using these two models, the number of observations is reduced signifficantly while the POMDP performance remains high particularly in the intention POMDP. In addition to these model components, POMDPs also require a reward function. So, we propose new algorithms for learning the POMDP reward model from dialogues based on inverse reinforcement learning (IRL). In particular, we propose the POMDP-IRL-BT algorithm (BT for belief transition) that works on the belief states available in the dialogues. This algorithm learns the reward model by estimating a belief transition model, similar to MDP (Markov decision process) transition models. Ultimately, we apply the proposed methods on a healthcare domain and learn a dialogue POMDP essentially from real unannotated and noisy dialogues

    Model-based Bayesian Reinforcement Learning for Dialogue Management

    Get PDF
    Reinforcement learning methods are increasingly used to optimise dialogue policies from experience. Most current techniques are model-free: they directly estimate the utility of various actions, without explicit model of the interaction dynamics. In this paper, we investigate an alternative strategy grounded in model-based Bayesian reinforcement learning. Bayesian inference is used to maintain a posterior distribution over the model parameters, reflecting the model uncertainty. This parameter distribution is gradually refined as more data is collected and simultaneously used to plan the agent's actions. Within this learning framework, we carried out experiments with two alternative formalisations of the transition model, one encoded with standard multinomial distributions, and one structured with probabilistic rules. We demonstrate the potential of our approach with empirical results on a user simulator constructed from Wizard-of-Oz data in a human-robot interaction scenario. The results illustrate in particular the benefits of capturing prior domain knowledge with high-level rules
    • …
    corecore