123 research outputs found
ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations
We describe PARANMT-50M, a dataset of more than 50 million English-English
sentential paraphrase pairs. We generated the pairs automatically by using
neural machine translation to translate the non-English side of a large
parallel corpus, following Wieting et al. (2017). Our hope is that ParaNMT-50M
can be a valuable resource for paraphrase generation and can provide a rich
source of semantic knowledge to improve downstream natural language
understanding tasks. To show its utility, we use ParaNMT-50M to train
paraphrastic sentence embeddings that outperform all supervised systems on
every SemEval semantic textual similarity competition, in addition to showing
how it can be used for paraphrase generation
Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings
We consider the problem of learning general-purpose, paraphrastic sentence
embeddings, revisiting the setting of Wieting et al. (2016b). While they found
LSTM recurrent networks to underperform word averaging, we present several
developments that together produce the opposite conclusion. These include
training on sentence pairs rather than phrase pairs, averaging states to
represent sequences, and regularizing aggressively. These improve LSTMs in both
transfer learning and supervised settings. We also introduce a new recurrent
architecture, the Gated Recurrent Averaging Network, that is inspired by
averaging and LSTMs while outperforming them both. We analyze our learned
models, finding evidence of preferences for particular parts of speech and
dependency relations.Comment: Published as a long paper at ACL 201
Gaussian Error Linear Units (GELUs)
We propose the Gaussian Error Linear Unit (GELU), a high-performing neural
network activation function. The GELU activation function is , where
the standard Gaussian cumulative distribution function. The GELU
nonlinearity weights inputs by their value, rather than gates inputs by their
sign as in ReLUs (). We perform an empirical evaluation of
the GELU nonlinearity against the ReLU and ELU activations and find performance
improvements across all considered computer vision, natural language
processing, and speech tasks.Comment: Trimmed version of 2016 draft; add exact formul
Learning to Embed Words in Context for Syntactic Tasks
We present models for embedding words in the context of surrounding words.
Such models, which we refer to as token embeddings, represent the
characteristics of a word that are specific to a given context, such as word
sense, syntactic category, and semantic role. We explore simple, efficient
token embedding models based on standard neural network architectures. We learn
token embeddings on a large amount of unannotated text and evaluate them as
features for part-of-speech taggers and dependency parsers trained on much
smaller amounts of annotated data. We find that predictors endowed with token
embeddings consistently outperform baseline predictors across a range of
context window and training set sizes.Comment: Accepted by ACL 2017 Repl4NLP worksho
Emergent Predication Structure in Hidden State Vectors of Neural Readers
A significant number of neural architectures for reading comprehension have
recently been developed and evaluated on large cloze-style datasets. We present
experiments supporting the emergence of "predication structure" in the hidden
state vectors of these readers. More specifically, we provide evidence that the
hidden state vectors represent atomic formulas where is a
semantic property (predicate) and is a constant symbol entity identifier.Comment: Accepted for Repl4NLP: 2nd Workshop on Representation Learning for
NL
- …