21 research outputs found

    GM-CTSC at SemEval-2020 Task 1: Gaussian Mixtures Cross Temporal Similarity Clustering

    Full text link
    This paper describes the system proposed for the SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection. We focused our approach on the detection problem. Given the semantics of words captured by temporal word embeddings in different time periods, we investigate the use of unsupervised methods to detect when the target word has gained or loosed senses. To this end, we defined a new algorithm based on Gaussian Mixture Models to cluster the target similarities computed over the two periods. We compared the proposed approach with a number of similarity-based thresholds. We found that, although the performance of the detection methods varies across the word embedding algorithms, the combination of Gaussian Mixture with Temporal Referencing resulted in our best system

    A Diachronic Italian Corpus based on “L’Unità”

    Get PDF
    In this paper, we describe the creation of a diachronic corpus for Italian by exploiting the digital archive of the newspaper “L’Unità”. We automatically clean and annotate the corpus with PoS tags, lemmas, named entities and syntactic dependencies. Moreover, we compute frequency-based time series for tokens,lemmas and entities. We show some interesting corpus statistics taking into account the temporal dimension and describe some examples of usage of time series

    DIACR-Ita @ EVALITA2020:Overview of the EVALITA2020 Diachronic Lexical Semantics (DIACR-Ita) Task

    Get PDF
    This paper describes the first edition of the “Diachronic Lexical Seman-tics” (DIACR-Ita) task at the EVALITA2020 campaign. The task challenges participants to develop systems that can automatically detect if a given word has changed its meaning over time, given con-textual information from corpora.The task, at its first edition, attracted 9 participant teams and collected a total of 36 sub-mission run

    Analyzing Gaussian distribution of semantic shifts in Lexical Semantic Change Models

    Get PDF
    In recent years, there has been a significant increase in interest in lexical semantic change detection. Many are the existing approaches, data used, and evaluation strategies to detect semantic shifts. The classification of change words against stable words requires thresholds to label the degree of semantic change. In this work, we compare state-of-the-art computational historical linguistics approaches to evaluate the efficacy of thresholds based on the Gaussian Distribution of semantic shifts. We present the results of an in-depth analysis conducted on both SemEval-2020 Task 1 Subtask 1 and DIACR-Ita tasks. Specifically, we compare Temporal Random Indexing, Temporal Referencing, Orthogonal Procrustes Alignment, Dynamic Word Embeddings and Temporal Word Embedding with a Compass. While results obtained with Gaussian thresholds achieve state-of-the-art performance in English, German, Swedish and Italian, they remain far from results obtained using the optimal threshold
    corecore