17 research outputs found

    Discovering Dialog Rules by means of an Evolutionary Approach

    Get PDF
    Designing the rules for the dialog management process is oneof the most resources-consuming tasks when developing a dialog system. Although statistical approaches to dialog management are becoming mainstream in research and industrial contexts, still many systems are being developed following the rule-based or hybrid paradigms. For example, when developers require deterministic system responses to keep total control on the decisions made by the system, or because the infrastructure employed is designed for rule-based systems using technologies currently used in commercial platforms. In this paper, we propose the use of evolutionary algorithms to automatically obtain the dialog rules that are implicit in a dialog corpus. Our proposal makes it possible to exploit the benefits of statistical approaches to build rule-based systems. Our proposal has been evaluated with a practical spoken dialog system, for which we have automatically obtained a set of fuzzy rules to successfully manage the dialog.The research leading to these results has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 823907 (MENHIR project:https://menhir-project.eu

    Understanding user state and preferences for robust spoken dialog systems and location-aware assistive technology

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science; and, (S.M. in Technology and Policy)--Massachusetts Institute of Technology, Engineering Systems Division, Technology and Policy Program, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 119-125).This research focuses on improving the performance of spoken dialog systems (SDS) in the domain of assistive technology for people with disabilities. Automatic speech recognition (ASR) has compelling potential applications as a means of enabling people with physical disabilities to enjoy greater levels of independence and participation. This thesis describes the development and evaluation of a spoken dialog system modeled as a partially observable Markov decision process (SDS-POMDP). The SDSPOMDP can understand commands related to making phone calls and providing information about weather, activities, and menus in a specialized-care residence setting. Labeled utterance data was used to train observation and utterance confidence models. With a user simulator, the SDS-POMDP reward function parameters were optimized, and the SDS-POMDP is shown to out-perform simpler threshold-based dialog strategies. These simulations were validated in experiments with human participants, with the SDS-POMDP resulting in more successful dialogs and faster dialog completion times, particularly for speakers with high word-error rates. This thesis also explores the social and ethical implications of deploying location based assistive technology in specialized-care settings. These technologies could have substantial potential benefit to residents and caregivers in such environments, but they may also raise issues related to user safety, independence, autonomy, or privacy. As one example, location-aware mobile devices are potentially useful to increase the safety of individuals in a specialized-care setting who may be at risk of unknowingly wandering, but they raise important questions about privacy and informed consent. This thesis provides a survey of U.S. legislation related to the participation of individuals who have questionable capacity to provide informed consent in research studies. Overall, it seeks to precisely describe and define the key issues that are arise as a result of new, unforeseen technologies that may have both benefits and costs to the elderly and people with disabilities.by William Li.S.M.in Technology and PolicyS.M

    Framework for Human Computer Interaction for Learning Dialogue Strategies using Controlled Natural Language in Information Systems

    Get PDF
    Spoken Language systems are going to have a tremendous impact in all the real world applications, be it healthcare enquiry, public transportation system or airline booking system maintaining the language ethnicity for interaction among users across the globe. These system have the capability of interacting with the user in di erent languages that the system supports. Normally when a person interacts with another person there are many non-verbal clues which guide the dialogue and all the utterances have a contextual relationship, which manage the dialogue as its mixed by the two speakers. Human Computer Interaction has a wide impact on the design of the applications and has become one of the emerging interest area of the researchers. All of us are witness to an explosive electronic revolution where lots of gadgets and gizmo's have surrounded us, advanced not only in power, design, applications but the ease of access or what we call user friendly interfaces are designed that we can easily use and control all the functionality of the devices. Since speech is one of the most intuitive form of interaction that humans use. It provides potential bene ts such as handfree access to machines, ergonomics and greater e ciency of interaction. Yet, speech-based interfaces design has been an expert job for a long time. Lot of research has been done in building real spoken Dialogue Systems which can interact with humans using voice interactions and help in performing various tasks as are done by humans. Last two decades have seen utmost advanced research in the automatic speech recognition, dialogue management, text to speech synthesis and Natural Language Processing for various applications which have shown positive results. This dissertation proposes to apply machine learning (ML) techniques to the problem of optimizing the dialogue management strategy selection in the Spoken Dialogue system prototype design. Although automatic speech recognition and system initiated dialogues where the system expects an answer in the form of `yes' or `no' have already been applied to Spoken Dialogue Systems( SDS), no real attempt to use those techniques in order to design a new system from scratch has been made. In this dissertation, we propose some novel ideas in order to achieve the goal of easing the design of Spoken Dialogue Systems and allow novices to have access to voice technologies. A framework for simulating and evaluating dialogues and learning optimal dialogue strategies in a controlled Natural Language is proposed. The simulation process is based on a probabilistic description of a dialogue and on the stochastic modelling of both arti cial NLP modules composing a SDS and the user. This probabilistic model is based on a set of parameters that can be tuned from the prior knowledge from the discourse or learned from data. The evaluation is part of the simulation process and is based on objective measures provided by each module. Finally, the simulation environment is connected to a learning agent using the supplied evaluation metrics as an objective function in order to generate an optimal behaviour for the SDS

    Learning user modelling strategies for adaptive referring expression generation in spoken dialogue systems

    Get PDF
    We address the problem of dynamic user modelling for referring expression generation in spoken dialogue systems, i.e how a spoken dialogue system should choose referring expressions to refer to domain entities to users with different levels of domain expertise, whose domain knowledge is initially unknown to the system. We approach this problem using a statistical planning framework: Reinforcement Learning techniques in Markov Decision Processes (MDP). We present a new reinforcement learning framework to learn user modelling strategies for adaptive referring expression generation (REG) in resource scarce domains (i.e. where no large corpus exists for learning). As a part of the framework, we present novel user simulation models that are sensitive to the referring expressions used by the system and are able to simulate users with different levels of domain knowledge. Such models are shown to simulate real user behaviour more closely than baseline user simulation models. In contrast to previous approaches to user adaptive systems, we do not assume that the user’s domain knowledge is available to the system before the conversation starts. We show that using a small corpus of non-adaptive dialogues it is possible to learn an adaptive user modelling policy in resource scarce domains using our framework. We also show that the learned user modelling strategies performed better in terms of adaptation than hand-coded baselines policies on both simulated and real users. With real users, the learned policy produced around 20% increase in adaptation in comparison to the best performing hand-coded adaptive baseline. We also show that adaptation to user’s domain knowledge results in improving task success (99.47% for learned policy vs 84.7% for hand-coded baseline) and reducing dialogue time of the conversation (11% relative difference). This is because users found it easier to identify domain objects when the system used adaptive referring expressions during the conversations

    Using Dialogue Acts in dialogue strategy learning: optimising repair strategies

    Get PDF
    Institute for Communicating and Collaborative SystemsA Spoken Dialogue System's (SDS's) dialogue strategy specifies which action it will take depending on its representation of the current dialogue context. Designing it by hand involves anticipating how users will interact with the system, and/or repeated testing and refining, and so can be a difficult, time-consuming task. Since SDSs inevitably make understanding errors, a particularly important issue is how to design ``repair strategies'', the parts of the dialogue strategy which attempt to get the dialogue ``back-on-track'' following these errors. To try to produce better dialogue strategies with less time and effort, previous researchers have modelled a dialogue strategy as a sequential decision problem called a Markov Decision Process (MDP), and then applied Reinforcement Learning (RL) algorithms to example training dialogues to generate dialogue strategies automatically. More recent research has used training dialogues conducted with simulated rather than real users and learned which action to take in all dialogue contexts, (a ``full'' as opposed to a ``partial'' dialogue strategy) - simulated users allow more training dialogues to be generated, and the exploration of new dialogue contexts not present in an original dataset. As yet however, limited insight has been provided as to which dialogue contextual features are important to include in the MDP and why. Indeed, a full dialogue strategy has not been learned from training dialogues with a realistic probabilistic user simulation derived from real user data, and then shown to work well with real users. This thesis investigates the value of adding new linguistically-motivated contextual features to the MDP when using RL to learn full dialogue strategies for SDSs. These new features are recent Dialogue Acts (DAs). DAs indicate the role or intention of an utterance in a dialogue e.g. ``provide-information'', an utterance being a complete unit of a speaker's speech, often bounded by silence. An accurate probabilistic user simulation learned from real user data is used for generating training dialogues, and the recent DAs are shown to improve performance in testing in simulation and with real users. With real users, performance is also better than other competing learned and hand-crafted strategies. Analysis of the strategies, and further simulation experiments show how the DAs improve performance through better repair strategies. The main findings are expected to apply to SDSs in general - indeed our strategies are learned and tested on real users in different domains, (flight-booking versus tourist information). Comparisons are also made to recent research which focuses on handling understanding errors in SDSs, but which does not use RL or user simulations

    Modelling Incremental Self-Repair Processing in Dialogue.

    Get PDF
    PhDSelf-repairs, where speakers repeat themselves, reformulate or restart what they are saying, are pervasive in human dialogue. These phenomena provide a window into real-time human language processing. For explanatory adequacy, a model of dialogue must include mechanisms that account for them. Artificial dialogue agents also need this capability for more natural interaction with human users. This thesis investigates the structure of self-repair and its function in the incremental construction of meaning in interaction. A corpus study shows how the range of self-repairs seen in dialogue cannot be accounted for by looking at surface form alone. More particularly it analyses a string-alignment approach and shows how it is insufficient, provides requirements for a suitable model of incremental context and an ontology of self-repair function. An information-theoretic model is developed which addresses these issues along with a system that automatically detects self-repairs and edit terms on transcripts incrementally with minimal latency, achieving state-of-the-art results. Additionally it is shown to have practical use in the psychiatric domain. The thesis goes on to present a dialogue model to interpret and generate repaired utterances incrementally. When processing repaired rather than fluent utterances, it achieves the same degree of incremental interpretation and incremental representation. Practical implementation methods are presented for an existing dialogue system. Finally, a more pragmatically oriented approach is presented to model self-repairs in a psycholinguistically plausible way. This is achieved through extending the dialogue model to include a probabilistic semantic framework to perform incremental inference in a reference resolution domain. The thesis concludes that at least as fine-grained a model of context as word-by-word is required for realistic models of self-repair, and context must include linguistic action sequences and information update effects. The way dialogue participants process self-repairs to make inferences in real time, rather than filter out their disfluency effects, has been modelled formally and in practical systems.Engineering and Physical Sciences Research Council (EPSRC) Doctoral Training Account (DTA) scholarship from the School of Electronic Engineering and Computer Science at Queen Mary University of London

    Recurrent neural models and related problems in natural language processing

    Get PDF
    Le réseau de neurones récurrent (RNN) est l’un des plus puissants modèles d’apprentissage automatique spécialis és dans la capture des variations temporelles et des dépendances de données séquentielles. Grâce à la résurgence de l’apprentissage en profondeur au cours de la dernière d écennie, de nombreuses structures RNN innovantes ont été invent ́ees et appliquées à divers problèmes pratiques, en particulier dans le domaine du traitement automatique du langage naturel (TALN). Cette thèse suit une direction similaire, dans laquelle nous proposons de nouvelles perspectives sur les propriétés structurelles des RNN et sur la manière dont les modèles RNN récemment proposés peuvent stimuler le developpement de nouveaux problèmes ouverts en TALN. Cette thèse se compose de deux parties: l’analyse de modèle et le traitement de nouveaux problèmes ouverts. Dans la première partie, nous explorons deux aspects importants des RNN: l’architecture de leurs connexions et les opérations de base dans leurs fonctions de transition. Plus précisément, dans le premier article, nous définissons plusieurs mesures rigoureuses pour évaluer la complexité architecturale de toute architecture récurrente donnée, quelle que soit la topologie du réseau. Des expériences approfondies sur ces mesures démontrent à la fois la validité théorique de celles-ci, et l’importance de guider la conception des architectures RNN. Dans le deuxième article, nous proposons un nouveau module permettant de combiner plusieurs flux d’informations de manière multiplicative dans les fonctions de tran- sition de base des RNN. Il a été démontré empiriquement que les RNN équipés du nouveau module possédaient de meilleures propriétés de gradient et des capacités de généralisation plus grandes sans coûts de calcul et de mémoire supplémentaires. La deuxième partie se concentre sur deux problèmes non résolus de la TALN: comment effectuer un raisonnement avancé à sauts multiples en compréhension de texte machine, et comment incorporer des traits de personnalité dans des systèmes conversationnels. Nous recueillons deux ensembles de données à grande échelle, dans le but de motiver les progrès méthodologiques sur ces deux problèmes. Spécifiquement, dans le troisième article, nous introduisons l'ensemble de données HotpotQA qui contient plus de 113000 paires question-réponse basées sur Wikipedia. La plupart des questions de HotpotQA ne peuvent résolues que par un raisonnement multi-saut précis sur plusieurs documents. Les faits à l'appui néces- saires au raisonnement sont également fournis pour aider le modèle à établir des prédictions explicables. Le quatrième article aborde le problème du manque de personnalité des chatbots. Le jeu de données persona-chat que nous proposons encourage des conversations plus engageantes et cohérentes en conditionnant la personnalité des membres en conversation sur des personnages spécifiques. Nous montrons des modèles de base entraînés sur persona-chat sont capables déxprimer des personnalités cohérentes et de réagir de manière plus captivante en se concentrant sur leurs propres personnages ainsi que ceux de leurs interlocuteurs.The recurrent neural network (RNN) is one of the most powerful machine learning models specialized in capturing temporal variations and dependencies of sequential data. Thanks to the resurgence of deep learning during the past decade, we have witnessed plenty of novel RNN structures being invented and applied to various practical problems especially in the field of natural language processing (NLP). This thesis follows a similar direction, in which we offer new insights about RNNs’ structural properties and how the recently proposed RNN models may stimulate the formation of new open problems in NLP. The scope of this thesis is divided into two parts: model analysis and new open problems. In the first part, we explore two important aspects of RNNs: their connecting architectures and basic operations in their transition functions. Specifically, in the first article, we define several rigorous measurements for evaluating the architectural complexity of any given recurrent architecture with arbitrary network topology. Thoroughgoing experiments on these measurements demonstrate their theoretical validity and utility of guiding the RNN architecture design. In the second article, we propose a novel module to combine different information flows multiplicatively in RNNs’ basic transition functions. RNNs equipped with the new module are empirically showed to have better gradient properties and stronger generalization capacities without extra computational and memory cost. The second part focuses on two open problems in NLP: how to perform advanced multi-hop reasoning in machine reading comprehension and how to encode personalities into chitchat dialogue systems. We collect two different large scale datasets aiming to motivate the methodological progress on these two problems. Particularly, in the third article we introduce HotpotQA dataset containing over 113k Wikipedia based question-answer pairs. Most of the questions in HotpotQA are answerable only through accurate multi-hop reasoning over multiple documents. Supporting facts required for reasoning are also provided to help the model to make explainable predictions. The fourth article tackles the problem of the lack of personality in chatbots. The proposed persona-chat dataset encourages more engaging and consistent conversations by forcing dialog partners conditioning on given personas. We show that baseline models trained on persona-chat are able to express consistent personalities and to respond in more captivating ways by concentrating on personas of both themselves and other interlocutors
    corecore