2 research outputs found
Situating Sentence Embedders with Nearest Neighbor Overlap
As distributed approaches to natural language semantics have developed and
diversified, embedders for linguistic units larger than words have come to play
an increasingly important role. To date, such embedders have been evaluated
using benchmark tasks (e.g., GLUE) and linguistic probes. We propose a
comparative approach, nearest neighbor overlap (N2O), that quantifies
similarity between embedders in a task-agnostic manner. N2O requires only a
collection of examples and is simple to understand: two embedders are more
similar if, for the same set of inputs, there is greater overlap between the
inputs' nearest neighbors. Though applicable to embedders of texts of any size,
we focus on sentence embedders and use N2O to show the effects of different
design choices and architectures.Comment: 17 pages, 7 figure
RPD: A Distance Function Between Word Embeddings
It is well-understood that different algorithms, training processes, and
corpora produce different word embeddings. However, less is known about the
relation between different embedding spaces, i.e. how far different sets of
embeddings deviate from each other. In this paper, we propose a novel metric
called Relative pairwise inner Product Distance (RPD) to quantify the distance
between different sets of word embeddings. This metric has a unified scale for
comparing different sets of word embeddings. Based on the properties of RPD, we
study the relations of word embeddings of different algorithms systematically
and investigate the influence of different training processes and corpora. The
results shed light on the poorly understood word embeddings and justify RPD as
a measure of the distance of embedding spaces.Comment: ACL Student Research Workshop 202