Many applications nowadays rely on statistical machine-learnt models, such as a rising
number of virtual personal assistants. To train statistical models, typically large amounts
of labelled data are required which are expensive and difficult to obtain. In this thesis, we
investigate two approaches that alleviate the need for labelled data by leveraging feedback to model outputs instead. Both scenarios are applied to two sequence-to-sequence
tasks for Natural Language Processing (NLP): machine translation and semantic parsing
for question-answering. Additionally, we define a new question-answering task based on
the geographical database OpenStreetMap (OSM) and collect a corpus, NLmaps v2, with
28,609 question-parse pairs. With the corpus, we build semantic parsers for subsequent experiments. Furthermore, we are the first to design a natural language interface to OSM, for
which we specifically tailor a parser.
The first approach to learn from feedback given to model outputs, considers a scenario
where weak supervision is available by grounding the model in a downstream task for
which labelled data has been collected. Feedback obtained from the downstream task is
used to improve the model in a response-based on-policy learning setup. We apply this
approach to improve a machine translation system, which is grounded in a multilingual
semantic parsing task, by employing ramp loss objectives. Next, we improve a neural semantic parser where only gold answers, but not gold parses, are available, by lifting ramp
loss objectives to non-linear neural networks. In the second approach to learn from feedback, instead of collecting expensive labelled data, a model is deployed and user-model
interactions are recorded in a log. This log is used to improve a model in a counterfactual
off-policy learning setup. We first exemplify this approach on a domain adaptation task for
machine translation. Here, we show that counterfactual learning can be applied to tasks
with large output spaces and, in contrast to prevalent theory, deterministic logs can successfully be used on sequence-to-sequence tasks for NLP. Next, we demonstrate on a semantic parsing task that counterfactual learning can also be applied when the underlying
model is a neural network and feedback is collected from human users. Applying both approaches to the same semantic parsing task, allows us to draw a direct comparison between
them. Response-based on-policy learning outperforms counterfactual off-policy learning,
but requires expensive labelled data for the downstream task, whereas interaction logs for
counterfactual learning can be easier to obtain in various scenarios