842 research outputs found
Why is unsupervised alignment of English embeddings from different algorithms so hard?
This paper presents a challenge to the community: Generative adversarial
networks (GANs) can perfectly align independent English word embeddings induced
using the same algorithm, based on distributional information alone; but fails
to do so, for two different embeddings algorithms. Why is that? We believe
understanding why, is key to understand both modern word embedding algorithms
and the limitations and instability dynamics of GANs. This paper shows that (a)
in all these cases, where alignment fails, there exists a linear transform
between the two embeddings (so algorithm biases do not lead to non-linear
differences), and (b) similar effects can not easily be obtained by varying
hyper-parameters. One plausible suggestion based on our initial experiments is
that the differences in the inductive biases of the embedding algorithms lead
to an optimization landscape that is riddled with local optima, leading to a
very small basin of convergence, but we present this more as a challenge paper
than a technical contribution.Comment: Accepted at EMNLP 201
Context Vectors are Reflections of Word Vectors in Half the Dimensions
This paper takes a step towards theoretical analysis of the relationship
between word embeddings and context embeddings in models such as word2vec. We
start from basic probabilistic assumptions on the nature of word vectors,
context vectors, and text generation. These assumptions are well supported
either empirically or theoretically by the existing literature. Next, we show
that under these assumptions the widely-used word-word PMI matrix is
approximately a random symmetric Gaussian ensemble. This, in turn, implies that
context vectors are reflections of word vectors in approximately half the
dimensions. As a direct application of our result, we suggest a theoretically
grounded way of tying weights in the SGNS model
Characterizing the impact of geometric properties of word embeddings on task performance
Analysis of word embedding properties to inform their use in downstream NLP
tasks has largely been studied by assessing nearest neighbors. However,
geometric properties of the continuous feature space contribute directly to the
use of embedding features in downstream models, and are largely unexplored. We
consider four properties of word embedding geometry, namely: position relative
to the origin, distribution of features in the vector space, global pairwise
distances, and local pairwise distances. We define a sequence of
transformations to generate new embeddings that expose subsets of these
properties to downstream models and evaluate change in task performance to
understand the contribution of each property to NLP models. We transform
publicly available pretrained embeddings from three popular toolkits (word2vec,
GloVe, and FastText) and evaluate on a variety of intrinsic tasks, which model
linguistic information in the vector space, and extrinsic tasks, which use
vectors as input to machine learning models. We find that intrinsic evaluations
are highly sensitive to absolute position, while extrinsic tasks rely primarily
on local similarity. Our findings suggest that future embedding models and
post-processing techniques should focus primarily on similarity to nearby
points in vector space.Comment: Appearing in the Third Workshop on Evaluating Vector Space
Representations for NLP (RepEval 2019). 7 pages + reference
Factors Influencing the Surprising Instability of Word Embeddings
Despite the recent popularity of word embedding methods, there is only a
small body of work exploring the limitations of these representations. In this
paper, we consider one aspect of embedding spaces, namely their stability. We
show that even relatively high frequency words (100-200 occurrences) are often
unstable. We provide empirical evidence for how various factors contribute to
the stability of word embeddings, and we analyze the effects of stability on
downstream tasks.Comment: NAACL HLT 201
What the Vec? Towards Probabilistically Grounded Embeddings
Word2Vec (W2V) and GloVe are popular, fast and efficient word embedding
algorithms. Their embeddings are widely used and perform well on a variety of
natural language processing tasks. Moreover, W2V has recently been adopted in
the field of graph embedding, where it underpins several leading algorithms.
However, despite their ubiquity and relatively simple model architecture, a
theoretical understanding of what the embedding parameters of W2V and GloVe
learn and why that is useful in downstream tasks has been lacking. We show that
different interactions between PMI vectors reflect semantic word relationships,
such as similarity and paraphrasing, that are encoded in low dimensional word
embeddings under a suitable projection, theoretically explaining why embeddings
of W2V and GloVe work. As a consequence, we also reveal an interesting
mathematical interconnection between the considered semantic relationships
themselves.Comment: Advances in Neural Information Processing, 201
Unsupervised Features Learning for Sampled Vector Fields
In this paper we introduce a new approach to computing hidden features of
sampled vector fields. The basic idea is to convert the vector field data to a
graph structure and use tools designed for automatic, unsupervised analysis of
graphs. Using a few data sets we show that the collected features of the vector
fields are correlated with the dynamics known for analytic models which
generates the data. In particular the method may be useful in analysis of data
sets where the analytic model is poorly understood or not known
- …