Search CORE

712 research outputs found

QMUL-SDS @ DIACR-Ita: Evaluating unsupervised diachronic lexical semantics classification in Italian

Author: Alkhalifa R
Liakata M
Tsakalidis A
Zubiaga A
Publication venue: Accademia University Press
Publication date: 05/11/2020
Field of study

In this paper, we present the results and main findings of our system for the DIACR-Ita 2020 Task. Our system focuses on using variations of training sets and different semantic detection methods. The task involves training, aligning and predicting a word's vector change from two diachronic Italian corpora. We demonstrate that using Temporal Word Embeddings with a Compass C-BOW model is more effective compared to different approaches including Logistic Regression and a Feed Forward Neural Network using accuracy. Our model ranked 3rd with an accuracy of 83.3%

arXiv.org e-Print Archive

Queen Mary Research Online

OpenEdition

On the Impact of Temporal Representations on Metaphor Detection

Author: Alam Mehwish
Ottolina Giorgio
Palmonari Matteo
Vimercati Manuel
Publication venue
Publication date: 01/01/2022
Field of study

State-of-the-art approaches for metaphor detection compare their literal - or core - meaning and their contextual meaning using metaphor classifiers based on neural networks. However, metaphorical expressions evolve over time due to various reasons, such as cultural and societal impact. Metaphorical expressions are known to co-evolve with language and literal word meanings, and even drive, to some extent, this evolution. This poses the question of whether different, possibly time-specific, representations of literal meanings may impact the metaphor detection task. To the best of our knowledge, this is the first study that examines the metaphor detection task with a detailed exploratory analysis where different temporal and static word embeddings are used to account for different representations of literal meanings. Our experimental analysis is based on three popular benchmarks used for metaphor detection and word embeddings extracted from different corpora and temporally aligned using different state-of-the-art approaches. The results suggest that the usage of different static word embedding methods does impact the metaphor detection task and some temporal word embeddings slightly outperform static methods. However, the results also suggest that temporal word embeddings may provide representations of the core meaning of the metaphor even too close to their contextual meaning, thus confusing the classifier. Overall, the interaction between temporal language evolution and metaphor detection appears tiny in the benchmark datasets used in our experiments. This suggests that future work for the computational analysis of this important linguistic phenomenon should first start by creating a new dataset where this interaction is better represented.Comment: 12 pages, 4 figure

arXiv.org e-Print Archive

Repositorium für Naturwissenschaften und Technik

On the Impact of Temporal Representations on Metaphor Detection

Author: Giorgio Ottolina
Manuel Vimercati
Matteo Palmonari
Mehwish Alam
Publication venue: Paris : European Language Resources Association (ELRA)
Publication date: 01/01/2022
Field of study

Repositorium für Naturwissenschaften und Technik

Analyzing Gaussian distribution of semantic shifts in Lexical Semantic Change Models

Author: Basile Pierpaolo
Cassotti Pierluigi
de&nbsp
Gemmis Marco
Semeraro Giovanni
Publication venue: 'OpenEdition'
Publication date: 01/01/2020
Field of study

In recent years, there has been a significant increase in interest in lexical semantic change detection. Many are the existing approaches, data used, and evaluation strategies to detect semantic shifts. The classification of change words against stable words requires thresholds to label the degree of semantic change. In this work, we compare state-of-the-art computational historical linguistics approaches to evaluate the efficacy of thresholds based on the Gaussian Distribution of semantic shifts. We present the results of an in-depth analysis conducted on both SemEval-2020 Task 1 Subtask 1 and DIACR-Ita tasks. Specifically, we compare Temporal Random Indexing, Temporal Referencing, Orthogonal Procrustes Alignment, Dynamic Word Embeddings and Temporal Word Embedding with a Compass. While results obtained with Gaussian thresholds achieve state-of-the-art performance in English, German, Swedish and Italian, they remain far from results obtained using the optimal threshold

Archivio istituzionale della ricerca - Università di Bari

A comparative study of approaches for the diachronic analysis of the Italian language

Author: Basile P.
Cassotti P.
De Gemmis M.
Semeraro G.
Publication venue: CEUR-WS
Publication date: 01/01/2020
Field of study

In recent years, there has been a significant increase in interest in lexical semantic change detection. Many are the existing approaches, data used, and evaluation strategies to detect semantic drift. Most of those approaches rely on diachronic word embeddings. Some of them are created as post-processing of static word embeddings, while others produce dynamic word embeddings where vectors share the same geometric space for all time slices. The large majority of the methods use English as the target language for the diachronic analysis, while other languages remain under-explored. In this work, we compare state-of-the-art approaches in computational historical linguistics to evaluate the pros and cons of each model, and we present the results of an in-depth analysis conducted using an Italian diachronic corpus. Specifically, several approaches based on both static embeddings and dynamic ones are implemented and evaluated by using the Kronos-It dataset. We train all word embeddings on the Italian Google n-gram corpus. The main result of the evaluation is that all approaches fail to significantly reduce the number of false-positive change points, which confirms that lexical semantic change is still a challenging task

Archivio istituzionale della ricerca - Università di Bari

DIACR-Ita @ EVALITA2020:Overview of the EVALITA2020 Diachronic Lexical Semantics (DIACR-Ita) Task

Author: Basile Pierpaolo
Caputo Annalina
Caselli Tommaso
Cassotti Pierluigi
Varvara Rossella
Publication venue: CEUR Workshop Proceedings (CEUR-WS.org)
Publication date: 01/01/2020
Field of study

University of Groningen

DIACR-Ita @ EVALITA2020: Overview of the EVALITA2020 diachronic lexical semantics (DIACR-Ita) task

Author: Basile P.
Caputo A.
Caselli T.
Cassotti P.
Varvara R.
Publication venue: CEUR-WS
Publication date: 01/01/2020
Field of study

This paper describes the first edition of the “Diachronic Lexical Semantics” (DIACR-Ita) task at the EVALITA 2020 campaign. The task challenges participants to develop systems that can automatically detect if a given word has changed its meaning over time, given contextual information from corpora. The task, at its first edition, attracted 9 participant teams and collected a total of 36 submission runs

Archivio istituzionale della ricerca - Università di Bari

When Politicians Talk About Politics: Identifying Political Tweets of Brazilian Congressmen

Author: Amaral Marcelo S.
de Melo Pedro O. S. Vaz
Oliveira Lucas S.
Pinho José Antônio. G.
Publication venue
Publication date: 03/05/2018
Field of study

Since June 2013, when Brazil faced the largest and most significant mass protests in a generation, a political crisis is in course. In midst of this crisis, Brazilian politicians use social media to communicate with the electorate in order to retain or to grow their political capital. The problem is that many controversial topics are in course and deputies may prefer to avoid such themes in their messages. To characterize this behavior, we propose a method to accurately identify political and non-political tweets independently of the deputy who posted it and of the time it was posted. Moreover, we collected tweets of all congressmen who were active on Twitter and worked in the Brazilian parliament from October 2013 to October 2017. To evaluate our method, we used word clouds and a topic model to identify the main political and non-political latent topics in parliamentarian tweets. Both results indicate that our proposal is able to accurately distinguish political from non-political tweets. Moreover, our analyses revealed a striking fact: more than half of the messages posted by Brazilian deputies are non-political.Comment: 4 pages, 7 figures, 2 table

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications