22 research outputs found
GM-CTSC at SemEval-2020 Task 1: Gaussian Mixtures Cross Temporal Similarity Clustering
This paper describes the system proposed for the SemEval-2020 Task 1:
Unsupervised Lexical Semantic Change Detection. We focused our approach on the
detection problem. Given the semantics of words captured by temporal word
embeddings in different time periods, we investigate the use of unsupervised
methods to detect when the target word has gained or loosed senses. To this
end, we defined a new algorithm based on Gaussian Mixture Models to cluster the
target similarities computed over the two periods. We compared the proposed
approach with a number of similarity-based thresholds. We found that, although
the performance of the detection methods varies across the word embedding
algorithms, the combination of Gaussian Mixture with Temporal Referencing
resulted in our best system
A Diachronic Italian Corpus based on âLâUnitĂ â
In this paper, we describe the creation of a diachronic corpus for Italian by exploiting the digital archive of the newspaper âLâUnitĂ â. We automatically clean and annotate the corpus with PoS tags, lemmas, named entities and syntactic dependencies. Moreover, we compute frequency-based time series for tokens,lemmas and entities. We show some interesting corpus statistics taking into account the temporal dimension and describe some examples of usage of time series
DIACR-Ita @ EVALITA2020:Overview of the EVALITA2020 Diachronic Lexical Semantics (DIACR-Ita) Task
This paper describes the first edition of the âDiachronic Lexical Seman-ticsâ (DIACR-Ita) task at the EVALITA2020 campaign. The task challenges participants to develop systems that can automatically detect if a given word has changed its meaning over time, given con-textual information from corpora.The task, at its first edition, attracted 9 participant teams and collected a total of 36 sub-mission run
Analyzing Gaussian distribution of semantic shifts in Lexical Semantic Change Models
In recent years, there has been a significant increase in interest in lexical semantic change
detection. Many are the existing approaches, data used, and evaluation strategies to detect
semantic shifts. The classification of change words against stable words requires thresholds to
label the degree of semantic change. In this work, we compare state-of-the-art computational
historical linguistics approaches to evaluate the efficacy of thresholds based on the Gaussian
Distribution of semantic shifts. We present the results of an in-depth analysis conducted on
both SemEval-2020 Task 1 Subtask 1 and DIACR-Ita tasks. Specifically, we compare Temporal
Random Indexing, Temporal Referencing, Orthogonal Procrustes Alignment, Dynamic Word
Embeddings and Temporal Word Embedding with a Compass. While results obtained with
Gaussian thresholds achieve state-of-the-art performance in English, German, Swedish and
Italian, they remain far from results obtained using the optimal threshold