7,200 research outputs found
Crosslingual Document Embedding as Reduced-Rank Ridge Regression
There has recently been much interest in extending vector-based word
representations to multiple languages, such that words can be compared across
languages. In this paper, we shift the focus from words to documents and
introduce a method for embedding documents written in any language into a
single, language-independent vector space. For training, our approach leverages
a multilingual corpus where the same concept is covered in multiple languages
(but not necessarily via exact translations), such as Wikipedia. Our method,
Cr5 (Crosslingual reduced-rank ridge regression), starts by training a
ridge-regression-based classifier that uses language-specific bag-of-word
features in order to predict the concept that a given document is about. We
show that, when constraining the learned weight matrix to be of low rank, it
can be factored to obtain the desired mappings from language-specific
bags-of-words to language-independent embeddings. As opposed to most prior
methods, which use pretrained monolingual word vectors, postprocess them to
make them crosslingual, and finally average word vectors to obtain document
vectors, Cr5 is trained end-to-end and is thus natively crosslingual as well as
document-level. Moreover, since our algorithm uses the singular value
decomposition as its core operation, it is highly scalable. Experiments show
that our method achieves state-of-the-art performance on a crosslingual
document retrieval task. Finally, although not trained for embedding sentences
and words, it also achieves competitive performance on crosslingual sentence
and word retrieval tasks.Comment: In The Twelfth ACM International Conference on Web Search and Data
Mining (WSDM '19
Inducing Systematicity in Transformers by Attending to Structurally Quantized Embeddings
Transformers generalize to novel compositions of structures and entities
after being trained on a complex dataset, but easily overfit on datasets of
insufficient complexity. We observe that when the training set is sufficiently
complex, the model encodes sentences that have a common syntactic structure
using a systematic attention pattern. Inspired by this observation, we propose
SQ-Transformer (Structurally Quantized) that explicitly encourages
systematicity in the embeddings and attention layers, even with a training set
of low complexity. At the embedding level, we introduce Structure-oriented
Vector Quantization (SoVQ) to cluster word embeddings into several classes of
structurally equivalent entities. At the attention level, we devise the
Systematic Attention Layer (SAL) and an alternative, Systematically Regularized
Layer (SRL) that operate on the quantized word embeddings so that sentences of
the same structure are encoded with invariant or similar attention patterns.
Empirically, we show that SQ-Transformer achieves stronger compositional
generalization than the vanilla Transformer on multiple low-complexity semantic
parsing and machine translation datasets. In our analysis, we show that SoVQ
indeed learns a syntactically clustered embedding space and SAL/SRL induces
generalizable attention patterns, which lead to improved systematicity.Comment: 22 pages, code: https://github.com/jiangycTarheel/SQ-Transforme
- …