119 research outputs found
Domain Adaptation for Neural Networks by Parameter Augmentation
We propose a simple domain adaptation method for neural networks in a
supervised setting. Supervised domain adaptation is a way of improving the
generalization performance on the target domain by using the source domain
dataset, assuming that both of the datasets are labeled. Recently, recurrent
neural networks have been shown to be successful on a variety of NLP tasks such
as caption generation; however, the existing domain adaptation techniques are
limited to (1) tune the model parameters by the target dataset after the
training by the source dataset, or (2) design the network to have dual output,
one for the source domain and the other for the target domain. Reformulating
the idea of the domain adaptation technique proposed by Daume (2007), we
propose a simple domain adaptation method, which can be applied to neural
networks trained with a cross-entropy loss. On captioning datasets, we show
performance improvements over other domain adaptation methods.Comment: 9 page. To appear in the first ACL Workshop on Representation
Learning for NL
How Does Beam Search improve Span-Level Confidence Estimation in Generative Sequence Labeling?
Sequence labeling is a core task in text understanding for IE/IR systems.
Text generation models have increasingly become the go-to solution for such
tasks (e.g., entity extraction and dialog slot filling). While most research
has focused on the labeling accuracy, a key aspect -- of vital practical
importance -- has slipped through the cracks: understanding model confidence.
More specifically, we lack a principled understanding of how to reliably gauge
the confidence of a model in its predictions for each labeled span. This paper
aims to provide some empirical insights on estimating model confidence for
generative sequence labeling. Most notably, we find that simply using the
decoder's output probabilities \textbf{is not} the best in realizing
well-calibrated confidence estimates. As verified over six public datasets of
different tasks, we show that our proposed approach -- which leverages
statistics from top- predictions by a beam search -- significantly reduces
calibration errors of the predictions of a generative sequence labeling model
Causal-aware Safe Policy Improvement for Task-oriented dialogue
The recent success of reinforcement learning's (RL) in solving complex tasks
is most often attributed to its capacity to explore and exploit an environment
where it has been trained. Sample efficiency is usually not an issue since
cheap simulators are available to sample data on-policy. On the other hand,
task oriented dialogues are usually learnt from offline data collected using
human demonstrations. Collecting diverse demonstrations and annotating them is
expensive. Unfortunately, use of RL methods trained on off-policy data are
prone to issues of bias and generalization, which are further exacerbated by
stochasticity in human response and non-markovian belief state of a dialogue
management system. To this end, we propose a batch RL framework for task
oriented dialogue policy learning: causal aware safe policy improvement
(CASPI). This method gives guarantees on dialogue policy's performance and also
learns to shape rewards according to intentions behind human responses, rather
than just mimicking demonstration data; this couple with batch-RL helps overall
with sample efficiency of the framework. We demonstrate the effectiveness of
this framework on a dialogue-context-to-text Generation and end-to-end dialogue
task of the Multiwoz2.0 dataset. The proposed method outperforms the current
state of the art on these metrics, in both case. In the end-to-end case, our
method trained only on 10\% of the data was able to out perform current state
in three out of four evaluation metrics
- …