Search CORE

20 research outputs found

UiO-UvA at SemEval-2020 Task 1: Contextualised Embeddings for Lexical Semantic Change Detection

Author: Giulianelli Mario
Kutuzov Andrey
Publication venue
Publication date: 01/01/2020
Field of study

We apply contextualised word embeddings to lexical semantic change detection in the SemEval-2020 Shared Task 1. This paper focuses on Subtask 2, ranking words by the degree of their semantic drift over time. We analyse the performance of two contextualising architectures (BERT and ELMo) and three change detection algorithms. We find that the most effective algorithms rely on the cosine similarity between averaged token embeddings and the pairwise distances between token embeddings. They outperform strong baselines by a large margin (in the post-evaluation phase, we have the best Subtask 2 submission for SemEval-2020 Task 1), but interestingly, the choice of a particular algorithm depends on the distribution of gold scores in the test set.Comment: To appear in Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval-2020

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

UvA-DARE

CL-IMS @ DIACR-Ita: Volente o Nolente: BERT does not Outperform SGNS on Semantic Change Detection

Author: Baldissin Gioia
Castañeda Enrique
Laicher Severin
Schlechtweg Dominik
Schulte im Walde Sabine
Publication venue: 'OpenEdition'
Publication date: 11/05/2021
Field of study

We present the results of our participation in the DIACR-Ita shared task on lexical semantic change detection for Italian. We exploit Average Pairwise Distance of token-based BERT embeddings between time points and rank 5 (of 8) in the official ranking with an accuracy of .72. While we tune parameters on the English data set of SemEval-2020 Task 1 and reach high performance, this does not translate to the Italian DIACR-Ita data set. Our results show that we do not manage to find robust ways to exploit BERT embeddings in lexical semantic change detection

OpenEdition

Analysing Lexical Semantic Change with Contextualised Word Representations

Author: Del Tredici Marco
Fernández Raquel
Giulianelli Mario
Publication venue
Publication date: 01/01/2020
Field of study

This paper presents the first unsupervised approach to lexical semantic change that makes use of contextualised word representations. We propose a novel method that exploits the BERT neural language model to obtain representations of word usages, clusters these representations into usage types, and measures change along time with three proposed metrics. We create a new evaluation dataset and show that the model representations and the detected semantic shifts are positively correlated with human judgements. Our extensive qualitative analysis demonstrates that our method captures a variety of synchronic and diachronic linguistic phenomena. We expect our work to inspire further research in this direction.Comment: To appear in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL-2020

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Statistically Significant Detection of Semantic Shifts using Contextual Word Embeddings

Author: Glowacka Dorota
Liu Yang
Medlar Alan
Publication venue: The Association for Computational Linguistics
Publication date: 24/09/2021
Field of study

Detecting lexical semantic change in smaller data sets, e.g. in historical linguistics and digital humanities, is challenging due to a lack of statistical power. This issue is exacerbated by non-contextual embedding models that produce one embedding per word and, therefore, mask the variability present in the data. In this article, we propose an approach to estimate semantic shift by combining contextual word embeddings with permutation-based statistical tests. We use the false discovery rate procedure to address the large number of hypothesis tests being conducted simultaneously. We demonstrate the performance of this approach in simulation where it achieves consistently high precision by suppressing false positives. We additionally analyze real-world data from SemEval-2020 Task 1 and the Liverpool FC subreddit corpus. We show that by taking sample variation into account, we can improve the robustness of individual semantic shift estimates without degrading overall performance.Peer reviewe

arXiv.org e-Print Archive

Helsingin yliopiston digitaalinen arkisto

Do Not Fire the Linguist : Grammatical Profiles Help Language Models Detect Semantic Change

Author: Giulianelli Mario
Kutuzov Andrey
Pivovarova Lidia
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2022
Field of study

Morphological and syntactic changes in word usage-as captured, e.g., by grammatical profiles-have been shown to be good predictors of a word's meaning change. In this work, we explore whether large pre-trained contextualised language models, a common tool for lexical semantic change detection, are sensitive to such morphosyntactic changes. To this end, we first compare the performance of grammatical profiles against that of a multilingual neural language model (XLM-R) on 10 datasets, covering 7 languages, and then combine the two approaches in ensembles to assess their complementarity. Our results show that ensembling grammatical profiles with XLM-R improves semantic change detection performance for most datasets and languages. This indicates that language models do not fully cover the fine-grained morphological and syntactic signals that are explicitly represented in grammatical profiles. An interesting exception are the test sets where the time spans under analysis are much longer than the time gap between them (for example, century-long spans with a one-year gap between them). Morphosyntactic change is slow so grammatical profiles do not detect in such cases. In contrast, language models, thanks to their access to lexical information, are able to detect fast topical changes.Peer reviewe

arXiv.org e-Print Archive

Helsingin yliopiston digitaalinen arkisto

International Migration, Integration and Social Cohesion online publications

UvA-DARE

RuShiftEval: A Shared Task on Semantic Shift Detection for Russian

Author: Kutuzov Andrey
Pivovarova Lidia
Publication venue: Redkollegija sbornika
Publication date: 01/01/2021
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Dynamic Contextualized Word Embeddings

Author: Hofmann Valentin
Li Wenjie
Navigli Roberto
Pierrehumbert Janet B.
Schütze Hinrich
Xia Fei
Zong Chengqing
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/08/2021
Field of study

Static word embeddings that represent words by a single vector cannot capture the variability of word meaning in different linguistic and extralinguistic contexts. Building on prior work on contextualized and dynamic word embeddings, we introduce dynamic contextualized word embeddings that represent words as a function of both linguistic and extralinguistic context. Based on a pretrained language model (PLM), dynamic contextualized word embeddings model time and social space jointly, which makes them attractive for a range of NLP tasks involving semantic variability. We highlight potential application scenarios by means of qualitative and quantitative analyses on four English datasets

Open Access LMU