102 research outputs found

    Reinforcement adaptation of an attention-based neural natural language generator for spoken dialogue systems

    Get PDF
    Following some recent propositions to handle natural language generation in spoken dialogue systems with long short-term memory recurrent neural network models~\citep{Wen2016a} we first investigate a variant thereof with the objective of a better integration of the attention subnetwork. Then our next objective is to propose and evaluate a framework to adapt the NLG module online through direct interactions with the users. When doing so the basic way is to ask the user to utter an alternative sentence to express a particular dialogue act. But then the system has to decide between using an automatic transcription or to ask for a manual transcription. To do so a reinforcement learning approach based on an adversarial bandit scheme is retained. We show that by defining appropriately the rewards as a linear combination of expected payoffs and costs of acquiring the new data provided by the user, a system design can balance between improving the system's performance towards a better match with the user's preferences and the burden associated with it. Then the actual benefits of this system is assessed with a human evaluation, showing that the addition of more diverse utterances allows to produce sentences more satisfying for the user

    Reinforcement Learning and Bandits for Speech and Language Processing: Tutorial, Review and Outlook

    Full text link
    In recent years, reinforcement learning and bandits have transformed a wide range of real-world applications including healthcare, finance, recommendation systems, robotics, and last but not least, the speech and natural language processing. While most speech and language applications of reinforcement learning algorithms are centered around improving the training of deep neural networks with its flexible optimization properties, there are still many grounds to explore to utilize the benefits of reinforcement learning, such as its reward-driven adaptability, state representations, temporal structures and generalizability. In this survey, we present an overview of recent advancements of reinforcement learning and bandits, and discuss how they can be effectively employed to solve speech and natural language processing problems with models that are adaptive, interactive and scalable.Comment: To appear in Expert Systems with Applications. Accompanying INTERSPEECH 2022 Tutorial on the same topic. Including latest advancements in large language models (LLMs

    Renforcement en-ligne pour l’apprentissage conjoint de l’analyseur sémantique et du gestionnaire de dialogue d’un système d’interaction vocale

    Get PDF
    Co-localisées avec la Plate-Forme Intelligence Artificielle (PFIA 2019)National audienceDesign of dialogue systems has witnessed many advances lately, yet acquiring a huge dataset remains a hindrance to their fast development for a new task or language. On-line learning is pursued in this paper as a convenient way to alleviate these difficulties. After the system modules are initiated, a single process handles data collection, annotation and use in training algorithms. A new challenge is to control the cost of the on-line learning borne by the user. Our work focuses on learning the semantic parsing and dialogue management modules. In this context, we propose several variants of simultaneous learning which are tested in user trials to confirm that only a few hundred training dialogues allow us to achieve good performance and overstep a rule-based handcrafted system. The analysis of these experiments gives us some insights, discussed in the paper, about the difficulty for the system’s trainers to establish a coherent and constant behavioural strategy to enable a fast and good-quality training phase.Si la conception des systèmes de dialogue a connu de nombreuses avancées ces dernières années, l’acquisition de grands ensembles de données reste une difficulté pour leur développement rapide dans le cadre d’une nouvelle tâche. L’apprentissage en-ligne est considéré dans cet article comme un moyen pratique de surmonter cette limite. Une fois les modules du système initialisés, un unique processus gère la collection des données, leur annotation et leur utilisation dans les algorithmes d’apprentissage. Il faut alors pouvoir contrôler le coût induit pour l’utilisateur lors de cet apprentissage en-ligne. Notre travail s’intéresse à l’apprentissage simultané des modules d’analyse sémantique et de gestion du dialogue. Dans ce contexte, nous proposons différentes variantes d’apprentissage conjoint qui sont testées avec des tests utilisateurs afin de confirmer que quelques centaines de dialogues d’apprentissage seulement permettent d’atteindre de bonnes performances, améliorant celles d’un système expert à base de règles. L’analyse de ces expérimentations dans l’article fait aussi apparaître des difficultés rencontrées par les entraîneurs du système pour établir une stratégie cohérente et stable durant la phase d’apprentissage

    Reinforcement Learning for Machine Translation: from Simulations to Real-World Applications

    Get PDF
    If a machine translation is wrong, how we can tell the underlying model to fix it? Answering this question requires (1) a machine learning algorithm to define update rules, (2) an interface for feedback to be submitted, and (3) expertise on the side of the human who gives the feedback. This thesis investigates solutions for machine learning updates, the suitability of feedback interfaces, and the dependency on reliability and expertise for different types of feedback. We start with an interactive online learning scenario where a machine translation (MT) system receives bandit feedback (i.e. only once per source) instead of references for learning. Policy gradient algorithms for statistical and neural MT are developed to learn from absolute and pairwise judgments. Our experiments on domain adaptation with simulated online feedback show that the models can largely improve under weak feedback, with variance reduction techniques being very effective. In production environments offline learning is often preferred over online learning. We evaluate algorithms for counterfactual learning from human feedback in a study on eBay product title translations. Feedback is either collected via explicit star ratings from users, or implicitly from the user interaction with cross-lingual product search. Leveraging implicit feedback turns out to be more successful due to lower levels of noise. We compare the reliability and learnability of absolute Likert-scale ratings with pairwise preferences in a smaller user study, and find that absolute ratings are overall more effective for improvements in down-stream tasks. Furthermore, we discover that error markings provide a cheap and practical alternative to error corrections. In a generalized interactive learning framework we propose a self-regulation approach, where the learner, guided by a regulator module, decides which type of feedback to choose for each input. The regulator is reinforced to find a good trade-off between supervision effect and cost. In our experiments, it discovers strategies that are more efficient than active learning and standard fully supervised learning

    Lilia, A Showcase for Fast Bootstrap of Conversation-Like Dialogues Based on a Goal-Oriented System

    Get PDF
    International audienceRecently many works have proposed to cast human-machine interaction in a sentence generation scheme. Neural networks models can learn how to generate a probable sentence based on the user's statement along with a partial view of the dialogue history. While appealing to some extent, these approaches require huge training sets of general-purpose data and lack a principled way to intertwine language generation with information retrieval from back-end resources to fuel the dialogue with actualised and precise knowledge. As a practical alternative, in this paper, we present Lilia, a showcase for fast bootstrap of conversation-like dialogues based on a goal-oriented system. First, a comparison of goal-oriented and conversational system features is led, then a conversion process is described for the fast bootstrap of a new system, finalised with an on-line training of the system's main components. Lilia is dedicated to a chitchat task, where speakers exchange viewpoints on a displayed image while trying collaboratively to derive its author's intention. Evaluations with user trials showed its efficiency in a realistic setup

    Learning with Minimal Supervision: New Meta-Learning and Reinforcement Learning Algorithms

    Get PDF
    Standard machine learning approaches thrive on learning from huge amounts of labeled training data, but what if we don’t have access to large amounts of labeled datasets? Humans have a remarkable ability to learn from only a few examples. To do so, they either build upon their prior learning experiences, or adapt to new circumstances by observing sparse learning signals. In this dissertation, we promote algorithms that learn with minimal amounts of supervision inspired by these two ideas. We discuss two families for minimally supervised learning algorithms based on meta-learning (or learning to learn) and reinforcement learning approaches.In the first part of the dissertation, we discuss meta-learning approaches for learning with minimal supervision. We present three meta-learning algorithms for few-shot adaptation of neural machine translation systems, promoting fairness in learned models by learning to actively learn under fairness parity constraints, and learning better exploration policies in the interactive contextual bandit setting. All of these algorithms simulate settings in which the agent has access to only a few labeled samples. Based on these simulations, the agent learns how to solve future learning tasks with minimal supervision. In the second part of the dissertation, we present learning algorithms based on reinforcement and imitation learning. In many settings the learning agent doesn’t have access to fully supervised training data, however, it might be able to leverage access to a sparse reward signal, or an expert that can be queried to collect the labeled data. It is important then to utilize these learning signals efficiently. Towards achieving this goal, we present three learning algorithms for learning from very sparse reward signals, leveraging access to noisy guidance, and solving structured prediction learning tasks under bandit feedback. In all cases, the result is a minimally supervised learning algorithm that can effectively learn given access to sparse reward signals

    MathDial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems

    Full text link
    While automatic dialogue tutors hold great potential in making education personalized and more accessible, research on such systems has been hampered by a lack of sufficiently large and high-quality datasets. Collecting such datasets remains challenging, as recording tutoring sessions raises privacy concerns and crowdsourcing leads to insufficient data quality. To address this, we propose a framework to generate such dialogues by pairing human teachers with a Large Language Model (LLM) prompted to represent common student errors. We describe how we use this framework to collect MathDial, a dataset of 3k one-to-one teacher-student tutoring dialogues grounded in multi-step math reasoning problems. While models like GPT-3 are good problem solvers, they fail at tutoring because they generate factually incorrect feedback or are prone to revealing solutions to students too early. To overcome this, we let teachers provide learning opportunities to students by guiding them using various scaffolding questions according to a taxonomy of teacher moves. We demonstrate MathDial and its extensive annotations can be used to finetune models to be more effective tutors (and not just solvers). We confirm this by automatic and human evaluation, notably in an interactive setting that measures the trade-off between student solving success and telling solutions. The dataset is released publicly.Comment: Jakub Macina, Nico Daheim, and Sankalan Pal Chowdhury contributed equally to this work. Accepted at EMNLP2023 Findings. Code and dataset available: https://github.com/eth-nlped/mathdia
    • …
    corecore