Search CORE

2 research outputs found

Situating Sentence Embedders with Nearest Neighbor Overlap

Author: Lin Lucy H.
Smith Noah A.
Publication venue
Publication date: 24/09/2019
Field of study

As distributed approaches to natural language semantics have developed and diversified, embedders for linguistic units larger than words have come to play an increasingly important role. To date, such embedders have been evaluated using benchmark tasks (e.g., GLUE) and linguistic probes. We propose a comparative approach, nearest neighbor overlap (N2O), that quantifies similarity between embedders in a task-agnostic manner. N2O requires only a collection of examples and is simple to understand: two embedders are more similar if, for the same set of inputs, there is greater overlap between the inputs' nearest neighbors. Though applicable to embedders of texts of any size, we focus on sentence embedders and use N2O to show the effects of different design choices and architectures.Comment: 17 pages, 7 figure

arXiv.org e-Print Archive

RPD: A Distance Function Between Word Embeddings

Author: Huang Shujian
Zheng Zaixiang
Zhou Xuhui
Publication venue
Publication date: 16/05/2020
Field of study

It is well-understood that different algorithms, training processes, and corpora produce different word embeddings. However, less is known about the relation between different embedding spaces, i.e. how far different sets of embeddings deviate from each other. In this paper, we propose a novel metric called Relative pairwise inner Product Distance (RPD) to quantify the distance between different sets of word embeddings. This metric has a unified scale for comparing different sets of word embeddings. Based on the properties of RPD, we study the relations of word embeddings of different algorithms systematically and investigate the influence of different training processes and corpora. The results shed light on the poorly understood word embeddings and justify RPD as a measure of the distance of embedding spaces.Comment: ACL Student Research Workshop 202

arXiv.org e-Print Archive