Response-Based and Counterfactual Learning for Sequence-to-Sequence Tasks in NLP

Abstract

Many applications nowadays rely on statistical machine-learnt models, such as a rising number of virtual personal assistants. To train statistical models, typically large amounts of labelled data are required which are expensive and difficult to obtain. In this thesis, we investigate two approaches that alleviate the need for labelled data by leveraging feedback to model outputs instead. Both scenarios are applied to two sequence-to-sequence tasks for Natural Language Processing (NLP): machine translation and semantic parsing for question-answering. Additionally, we define a new question-answering task based on the geographical database OpenStreetMap (OSM) and collect a corpus, NLmaps v2, with 28,609 question-parse pairs. With the corpus, we build semantic parsers for subsequent experiments. Furthermore, we are the first to design a natural language interface to OSM, for which we specifically tailor a parser. The first approach to learn from feedback given to model outputs, considers a scenario where weak supervision is available by grounding the model in a downstream task for which labelled data has been collected. Feedback obtained from the downstream task is used to improve the model in a response-based on-policy learning setup. We apply this approach to improve a machine translation system, which is grounded in a multilingual semantic parsing task, by employing ramp loss objectives. Next, we improve a neural semantic parser where only gold answers, but not gold parses, are available, by lifting ramp loss objectives to non-linear neural networks. In the second approach to learn from feedback, instead of collecting expensive labelled data, a model is deployed and user-model interactions are recorded in a log. This log is used to improve a model in a counterfactual off-policy learning setup. We first exemplify this approach on a domain adaptation task for machine translation. Here, we show that counterfactual learning can be applied to tasks with large output spaces and, in contrast to prevalent theory, deterministic logs can successfully be used on sequence-to-sequence tasks for NLP. Next, we demonstrate on a semantic parsing task that counterfactual learning can also be applied when the underlying model is a neural network and feedback is collected from human users. Applying both approaches to the same semantic parsing task, allows us to draw a direct comparison between them. Response-based on-policy learning outperforms counterfactual off-policy learning, but requires expensive labelled data for the downstream task, whereas interaction logs for counterfactual learning can be easier to obtain in various scenarios

    Similar works