727 research outputs found
Selecting artificially-generated sentences for fine-tuning neural machine translation
Neural Machine Translation (NMT) models
tend to achieve best performance when larger
sets of parallel sentences are provided for trai-
ning. For this reason, augmenting the training
set with artificially-generated sentence pairs
can boost performance.
Nonetheless, the performance can also be im-
proved with a small number of sentences
if they are in the same domain as the test
set. Accordingly, we want to explore the use
of artificially-generated sentences along with
data-selection algorithms to improve German-
to-English NMT models trained solely with
authentic data.
In this work, we show how artificially-
generated sentences can be more beneficial
than authentic pairs, and demonstrate their ad-
vantages when used in combination with data-
selection algorithms
Adaptation of machine translation models with back-translated data using transductive data selection methods
Data selection has proven its merit for improving Neural Machine Translation (NMT), when applied to authentic data. But the beneļ¬t of using synthetic data in NMT training, produced by the popular back-translation technique, raises the question if data selection could also be useful for synthetic data? In this work we use Infrequent n-gram Recovery (INR) and Feature Decay Algorithms (FDA), two transductive data selection methods to obtain subsets of sentences from synthetic data. These methods ensure that selected sentences share n-grams with the test set so the NMT model can be adapted to translate it. Performing data selection on back-translated data creates new challenges as the source-side may contain noise originated by the model used in the back-translation. Hence, ļ¬nding ngrams present in the test set become more diļ¬cult. Despite that, in our work we show that adapting a model with a selection of synthetic data is an useful approach
TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection
We propose TANDA, an effective technique for fine-tuning pre-trained
Transformer models for natural language tasks. Specifically, we first transfer
a pre-trained model into a model for a general task by fine-tuning it with a
large and high-quality dataset. We then perform a second fine-tuning step to
adapt the transferred model to the target domain. We demonstrate the benefits
of our approach for answer sentence selection, which is a well-known inference
task in Question Answering. We built a large scale dataset to enable the
transfer step, exploiting the Natural Questions dataset. Our approach
establishes the state of the art on two well-known benchmarks, WikiQA and
TREC-QA, achieving MAP scores of 92% and 94.3%, respectively, which largely
outperform the previous highest scores of 83.4% and 87.5%, obtained in very
recent work. We empirically show that TANDA generates more stable and robust
models reducing the effort required for selecting optimal hyper-parameters.
Additionally, we show that the transfer step of TANDA makes the adaptation step
more robust to noise. This enables a more effective use of noisy datasets for
fine-tuning. Finally, we also confirm the positive impact of TANDA in an
industrial setting, using domain specific datasets subject to different types
of noise.Comment: Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020),
Oral Presentatio
- ā¦