Search CORE

2 research outputs found

Unsupervised Word Polysemy Quantification with Multiresolution Grids of Contextual Embeddings

Author: Tixier Antoine
Vazirgiannis Michalis
Xypolopoulos Christos
Publication venue: Association for Computational Linguistics
Publication date: 12/02/2021
Field of study

International audienceThe number of senses of a given word, or polysemy, is a very subjective notion, which varies widely across annotators and resources. We propose a novel method to estimate polysemy based on simple geometry in the contextual embedding space. Our approach is fully unsupervised and purely data-driven. Through rigorous experiments, we show that our rankings are well correlated, with strong statistical significance, with 6 different rankings derived from famous human-constructed resources such as WordNet, OntoNotes, Oxford, Wikipedia, etc., for 6 different standard metrics. We also visualize and analyze the correlation between the human rankings and make interesting observations. A valuable by-product of our method is the ability to sample, at no extra cost, sentences containing different senses of a given word. Finally, the fully unsupervised nature of our approach makes it applicable to any language. Code and data are publicly available (https://github.com/ksipos/polysemy-assessment)

arXiv.org e-Print Archive

HAL-Polytechnique

Advances in database technology - EDBT 2016: 19th International Conference on Extending Database Technology, Bordeaux, France, March 15-18, 2016 : proceedings

Author
Publication venue: University of Konstanz, University Library
Publication date: 01/01/2016
Field of study

Digitale Bibliothek Thüringen