Search CORE

21 research outputs found

GM-CTSC at SemEval-2020 Task 1: Gaussian Mixtures Cross Temporal Similarity Clustering

Author: Basile Pierpaolo
Caputo Annalina
Cassotti Pierluigi
Polignano Marco
Publication venue
Publication date: 20/05/2020
Field of study

This paper describes the system proposed for the SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection. We focused our approach on the detection problem. Given the semantics of words captured by temporal word embeddings in different time periods, we investigate the use of unsupervised methods to detect when the target word has gained or loosed senses. To this end, we defined a new algorithm based on Gaussian Mixture Models to cluster the target similarities computed over the two periods. We compared the proposed approach with a number of similarity-based thresholds. We found that, although the performance of the detection methods varies across the word embedding algorithms, the combination of Gaussian Mixture with Temporal Referencing resulted in our best system

arXiv.org e-Print Archive

Irish Universities

DCU Online Research Access Service

DIACR-Ita @ EVALITA2020:Overview of the EVALITA2020 Diachronic Lexical Semantics (DIACR-Ita) Task

Author: Basile Pierpaolo
Caputo Annalina
Caselli Tommaso
Cassotti Pierluigi
Varvara Rossella
Publication venue: CEUR Workshop Proceedings (CEUR-WS.org)
Publication date: 01/01/2020
Field of study

University of Groningen

A Diachronic Italian Corpus based on “L’Unità”

Author: Basile Pierpaolo
Caputo Annalina
Caselli Tommaso
Cassotti Pierluigi
Varvara Rossella
Publication venue: CEUR Workshop Proceedings (CEUR-WS.org)
Publication date: 01/01/2020
Field of study

Dissertations of the University of Groningen

DIACR-Ita @ EVALITA2020:Overview of the EVALITA2020 Diachronic Lexical Semantics (DIACR-Ita) Task

Author: Basile Pierpaolo
Caputo Annalina
Caselli Tommaso
Cassotti Pierluigi
Varvara Rossella
Publication venue: CEUR Workshop Proceedings (CEUR-WS.org)
Publication date: 01/01/2020
Field of study

ARTS repository - University of Groningen

A Diachronic Italian Corpus based on “L’Unità”

Author: Basile Pierpaolo
Caputo Annalina
Caselli Tommaso
Cassotti Pierluigi
Varvara Rossella
Publication venue: CEUR Workshop Proceedings (CEUR-WS.org)
Publication date: 01/01/2020
Field of study

University of Groningen

DIACR-Ita @ EVALITA2020:Overview of the EVALITA2020 Diachronic Lexical Semantics (DIACR-Ita) Task

Author: Basile Pierpaolo
Caputo Annalina
Caselli Tommaso
Cassotti Pierluigi
Varvara Rossella
Publication venue: CEUR Workshop Proceedings (CEUR-WS.org)
Publication date: 01/01/2020
Field of study

Dissertations of the University of Groningen

A Diachronic Italian Corpus based on “L’Unità”

Author: Basile Pierpaolo
Caputo Annalina
Caselli Tommaso
Cassotti Pierluigi
Varvara Rossella
Publication venue: CEUR Workshop Proceedings (CEUR-WS.org)
Publication date: 01/01/2020
Field of study

ARTS repository - University of Groningen

A Diachronic Italian Corpus based on “L’Unità”

Author: Basile Pierpaolo
Caputo Annalina
Caselli Tommaso
Cassotti Pierluigi
Varvara Rossella
Publication venue: CEUR Workshop Proceedings (CEUR-WS.org)
Publication date: 01/01/2020
Field of study

In this paper, we describe the creation of a diachronic corpus for Italian by exploiting the digital archive of the newspaper “L’Unità”. We automatically clean and annotate the corpus with PoS tags, lemmas, named entities and syntactic dependencies. Moreover, we compute frequency-based time series for tokens,lemmas and entities. We show some interesting corpus statistics taking into account the temporal dimension and describe some examples of usage of time series

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

OpenEdition

Dissertations of the University of Groningen

DIACR-Ita @ EVALITA2020:Overview of the EVALITA2020 Diachronic Lexical Semantics (DIACR-Ita) Task

Author: Basile Pierpaolo
Caputo Annalina
Caselli Tommaso
Cassotti Pierluigi
Varvara Rossella
Publication venue: CEUR Workshop Proceedings (CEUR-WS.org)
Publication date: 01/01/2020
Field of study

This paper describes the first edition of the “Diachronic Lexical Seman-tics” (DIACR-Ita) task at the EVALITA2020 campaign. The task challenges participants to develop systems that can automatically detect if a given word has changed its meaning over time, given con-textual information from corpora.The task, at its first edition, attracted 9 participant teams and collected a total of 36 sub-mission run

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Irish Universities

DCU Online Research Access Service

OpenEdition

Dissertations of the University of Groningen

Analyzing Gaussian distribution of semantic shifts in Lexical Semantic Change Models

Author: Basile Pierpaolo
Cassotti Pierluigi
de&nbsp
Gemmis Marco
Semeraro Giovanni
Publication venue: 'OpenEdition'
Publication date: 01/01/2020
Field of study

In recent years, there has been a significant increase in interest in lexical semantic change detection. Many are the existing approaches, data used, and evaluation strategies to detect semantic shifts. The classification of change words against stable words requires thresholds to label the degree of semantic change. In this work, we compare state-of-the-art computational historical linguistics approaches to evaluate the efficacy of thresholds based on the Gaussian Distribution of semantic shifts. We present the results of an in-depth analysis conducted on both SemEval-2020 Task 1 Subtask 1 and DIACR-Ita tasks. Specifically, we compare Temporal Random Indexing, Temporal Referencing, Orthogonal Procrustes Alignment, Dynamic Word Embeddings and Temporal Word Embedding with a Compass. While results obtained with Gaussian thresholds achieve state-of-the-art performance in English, German, Swedish and Italian, they remain far from results obtained using the optimal threshold

Archivio istituzionale della ricerca - Università di Bari