418 research outputs found
On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference
We propose a process for investigating the extent to which sentence
representations arising from neural machine translation (NMT) systems encode
distinct semantic phenomena. We use these representations as features to train
a natural language inference (NLI) classifier based on datasets recast from
existing semantic annotations. In applying this process to a representative NMT
system, we find its encoder appears most suited to supporting inferences at the
syntax-semantics interface, as compared to anaphora resolution requiring
world-knowledge. We conclude with a discussion on the merits and potential
deficiencies of the existing process, and how it may be improved and extended
as a broader framework for evaluating semantic coverage.Comment: To be presented at NAACL 2018 - 11 page
What do Neural Machine Translation Models Learn about Morphology?
Neural machine translation (MT) models obtain state-of-the-art performance
while maintaining a simple, end-to-end architecture. However, little is known
about what these models learn about source and target languages during the
training process. In this work, we analyze the representations learned by
neural MT models at various levels of granularity and empirically evaluate the
quality of the representations for learning morphology through extrinsic
part-of-speech and morphological tagging tasks. We conduct a thorough
investigation along several parameters: word-based vs. character-based
representations, depth of the encoding layer, the identity of the target
language, and encoder vs. decoder representations. Our data-driven,
quantitative evaluation sheds light on important aspects in the neural MT
system and its ability to capture word structure.Comment: Updated decoder experiment
Masked Language Model Scoring
Pretrained masked language models (MLMs) require finetuning for most NLP
tasks. Instead, we evaluate MLMs out of the box via their pseudo-log-likelihood
scores (PLLs), which are computed by masking tokens one by one. We show that
PLLs outperform scores from autoregressive language models like GPT-2 in a
variety of tasks. By rescoring ASR and NMT hypotheses, RoBERTa reduces an
end-to-end LibriSpeech model's WER by 30% relative and adds up to +1.7 BLEU on
state-of-the-art baselines for low-resource translation pairs, with further
gains from domain adaptation. We attribute this success to PLL's unsupervised
expression of linguistic acceptability without a left-to-right bias, greatly
improving on scores from GPT-2 (+10 points on island effects, NPI licensing in
BLiMP). One can finetune MLMs to give scores without masking, enabling
computation in a single inference pass. In all, PLLs and their associated
pseudo-perplexities (PPPLs) enable plug-and-play use of the growing number of
pretrained MLMs; e.g., we use a single cross-lingual model to rescore
translations in multiple languages. We release our library for language model
scoring at https://github.com/awslabs/mlm-scoring.Comment: ACL 2020 camera-ready (presented July 2020
ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations
We describe PARANMT-50M, a dataset of more than 50 million English-English
sentential paraphrase pairs. We generated the pairs automatically by using
neural machine translation to translate the non-English side of a large
parallel corpus, following Wieting et al. (2017). Our hope is that ParaNMT-50M
can be a valuable resource for paraphrase generation and can provide a rich
source of semantic knowledge to improve downstream natural language
understanding tasks. To show its utility, we use ParaNMT-50M to train
paraphrastic sentence embeddings that outperform all supervised systems on
every SemEval semantic textual similarity competition, in addition to showing
how it can be used for paraphrase generation
- …