10,575 research outputs found
A Joint Model for Word Embedding and Word Morphology
This paper presents a joint model for
performing unsupervised morphological
analysis on words, and learning a
character-level composition function
from morphemes to word embeddings.
Our model splits individual words into
segments, and weights each segment
according to its ability to predict context
words. Our morphological analysis is
comparable to dedicated morphological
analyzers at the task of morpheme boundary
recovery, and also performs better
than word-based embedding models at
the task of syntactic analogy answering.
Finally, we show that incorporating
morphology explicitly into character-level
models helps them produce embeddings
for unseen words which correlate better
with human judgments
Morphological Priors for Probabilistic Neural Word Embeddings
Word embeddings allow natural language processing systems to share
statistical information across related words. These embeddings are typically
based on distributional statistics, making it difficult for them to generalize
to rare or unseen words. We propose to improve word embeddings by incorporating
morphological information, capturing shared sub-word features. Unlike previous
work that constructs word embeddings directly from morphemes, we combine
morphological and distributional information in a unified probabilistic
framework, in which the word embedding is a latent variable. The morphological
information provides a prior distribution on the latent word embeddings, which
in turn condition a likelihood function over an observed corpus. This approach
yields improvements on intrinsic word similarity evaluations, and also in the
downstream task of part-of-speech tagging.Comment: Appeared at the Conference on Empirical Methods in Natural Language
Processing (EMNLP 2016, Austin
- …