908 research outputs found
What do Neural Machine Translation Models Learn about Morphology?
Neural machine translation (MT) models obtain state-of-the-art performance
while maintaining a simple, end-to-end architecture. However, little is known
about what these models learn about source and target languages during the
training process. In this work, we analyze the representations learned by
neural MT models at various levels of granularity and empirically evaluate the
quality of the representations for learning morphology through extrinsic
part-of-speech and morphological tagging tasks. We conduct a thorough
investigation along several parameters: word-based vs. character-based
representations, depth of the encoding layer, the identity of the target
language, and encoder vs. decoder representations. Our data-driven,
quantitative evaluation sheds light on important aspects in the neural MT
system and its ability to capture word structure.Comment: Updated decoder experiment
Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation
We introduce a model for constructing vector representations of words by
composing characters using bidirectional LSTMs. Relative to traditional word
representation models that have independent vectors for each word type, our
model requires only a single vector per character type and a fixed set of
parameters for the compositional model. Despite the compactness of this model
and, more importantly, the arbitrary nature of the form-function relationship
in language, our "composed" word representations yield state-of-the-art results
in language modeling and part-of-speech tagging. Benefits over traditional
baselines are particularly pronounced in morphologically rich languages (e.g.,
Turkish)
Character-Aware Neural Language Models
We describe a simple neural language model that relies only on
character-level inputs. Predictions are still made at the word-level. Our model
employs a convolutional neural network (CNN) and a highway network over
characters, whose output is given to a long short-term memory (LSTM) recurrent
neural network language model (RNN-LM). On the English Penn Treebank the model
is on par with the existing state-of-the-art despite having 60% fewer
parameters. On languages with rich morphology (Arabic, Czech, French, German,
Spanish, Russian), the model outperforms word-level/morpheme-level LSTM
baselines, again with fewer parameters. The results suggest that on many
languages, character inputs are sufficient for language modeling. Analysis of
word representations obtained from the character composition part of the model
reveals that the model is able to encode, from characters only, both semantic
and orthographic information.Comment: AAAI 201
Character Composition Model with Convolutional Neural Networks for Dependency Parsing on Morphologically Rich Languages
We present a transition-based dependency parser that uses a convolutional
neural network to compose word representations from characters. The character
composition model shows great improvement over the word-lookup model,
especially for parsing agglutinative languages. These improvements are even
better than using pre-trained word embeddings from extra data. On the SPMRL
data sets, our system outperforms the previous best greedy parser (Ballesteros
et al., 2015) by a margin of 3% on average.Comment: Accepted in ACL 2017 (Short
Mimicking Word Embeddings using Subword RNNs
Word embeddings improve generalization over lexical features by placing each
word in a lower-dimensional space, using distributional information obtained
from unlabeled data. However, the effectiveness of word embeddings for
downstream NLP tasks is limited by out-of-vocabulary (OOV) words, for which
embeddings do not exist. In this paper, we present MIMICK, an approach to
generating OOV word embeddings compositionally, by learning a function from
spellings to distributional embeddings. Unlike prior work, MIMICK does not
require re-training on the original word embedding corpus; instead, learning is
performed at the type level. Intrinsic and extrinsic evaluations demonstrate
the power of this simple approach. On 23 languages, MIMICK improves performance
over a word-based baseline for tagging part-of-speech and morphosyntactic
attributes. It is competitive with (and complementary to) a supervised
character-based model in low-resource settings.Comment: EMNLP 201
External Lexical Information for Multilingual Part-of-Speech Tagging
Morphosyntactic lexicons and word vector representations have both proven
useful for improving the accuracy of statistical part-of-speech taggers. Here
we compare the performances of four systems on datasets covering 16 languages,
two of these systems being feature-based (MEMMs and CRFs) and two of them being
neural-based (bi-LSTMs). We show that, on average, all four approaches perform
similarly and reach state-of-the-art results. Yet better performances are
obtained with our feature-based models on lexically richer datasets (e.g. for
morphologically rich languages), whereas neural-based results are higher on
datasets with less lexical variability (e.g. for English). These conclusions
hold in particular for the MEMM models relying on our system MElt, which
benefited from newly designed features. This shows that, under certain
conditions, feature-based approaches enriched with morphosyntactic lexicons are
competitive with respect to neural methods
- …