17,822 research outputs found
BattRAE: Bidimensional Attention-Based Recursive Autoencoders for Learning Bilingual Phrase Embeddings
In this paper, we propose a bidimensional attention based recursive
autoencoder (BattRAE) to integrate clues and sourcetarget interactions at
multiple levels of granularity into bilingual phrase representations. We employ
recursive autoencoders to generate tree structures of phrases with embeddings
at different levels of granularity (e.g., words, sub-phrases and phrases). Over
these embeddings on the source and target side, we introduce a bidimensional
attention network to learn their interactions encoded in a bidimensional
attention matrix, from which we extract two soft attention weight distributions
simultaneously. These weight distributions enable BattRAE to generate
compositive phrase representations via convolution. Based on the learned phrase
representations, we further use a bilinear neural model, trained via a
max-margin method, to measure bilingual semantic similarity. To evaluate the
effectiveness of BattRAE, we incorporate this semantic similarity as an
additional feature into a state-of-the-art SMT system. Extensive experiments on
NIST Chinese-English test sets show that our model achieves a substantial
improvement of up to 1.63 BLEU points on average over the baseline.Comment: 7 pages, accepted by AAAI 201
Using the Output Embedding to Improve Language Models
We study the topmost weight matrix of neural network language models. We show
that this matrix constitutes a valid word embedding. When training language
models, we recommend tying the input embedding and this output embedding. We
analyze the resulting update rules and show that the tied embedding evolves in
a more similar way to the output embedding than to the input embedding in the
untied model. We also offer a new method of regularizing the output embedding.
Our methods lead to a significant reduction in perplexity, as we are able to
show on a variety of neural network language models. Finally, we show that
weight tying can reduce the size of neural translation models to less than half
of their original size without harming their performance.Comment: To appear in EACL 201
- …