712 research outputs found
QMUL-SDS @ DIACR-Ita: Evaluating unsupervised diachronic lexical semantics classification in Italian
In this paper, we present the results and main findings of our system for the DIACR-Ita 2020 Task. Our system focuses on using variations of training sets and different semantic detection methods. The task involves training, aligning and predicting a word's vector change from two diachronic Italian corpora. We demonstrate that using Temporal Word Embeddings with a Compass C-BOW model is more effective compared to different approaches including Logistic Regression and a Feed Forward Neural Network using accuracy. Our model ranked 3rd with an accuracy of 83.3%
On the Impact of Temporal Representations on Metaphor Detection
State-of-the-art approaches for metaphor detection compare their literal - or
core - meaning and their contextual meaning using metaphor classifiers based on
neural networks. However, metaphorical expressions evolve over time due to
various reasons, such as cultural and societal impact. Metaphorical expressions
are known to co-evolve with language and literal word meanings, and even drive,
to some extent, this evolution. This poses the question of whether different,
possibly time-specific, representations of literal meanings may impact the
metaphor detection task. To the best of our knowledge, this is the first study
that examines the metaphor detection task with a detailed exploratory analysis
where different temporal and static word embeddings are used to account for
different representations of literal meanings. Our experimental analysis is
based on three popular benchmarks used for metaphor detection and word
embeddings extracted from different corpora and temporally aligned using
different state-of-the-art approaches. The results suggest that the usage of
different static word embedding methods does impact the metaphor detection task
and some temporal word embeddings slightly outperform static methods. However,
the results also suggest that temporal word embeddings may provide
representations of the core meaning of the metaphor even too close to their
contextual meaning, thus confusing the classifier. Overall, the interaction
between temporal language evolution and metaphor detection appears tiny in the
benchmark datasets used in our experiments. This suggests that future work for
the computational analysis of this important linguistic phenomenon should first
start by creating a new dataset where this interaction is better represented.Comment: 12 pages, 4 figure
On the Impact of Temporal Representations on Metaphor Detection
State-of-the-art approaches for metaphor detection compare their literal - or core - meaning and their contextual meaning using metaphor classifiers based on neural networks. However, metaphorical expressions evolve over time due to various reasons, such as cultural and societal impact. Metaphorical expressions are known to co-evolve with language and literal word meanings, and even drive, to some extent, this evolution. This poses the question of whether different, possibly time-specific, representations of literal meanings may impact the metaphor detection task. To the best of our knowledge, this is the first study that examines the metaphor detection task with a detailed exploratory analysis where different temporal and static word embeddings are used to account for different representations of literal meanings. Our experimental analysis is based on three popular benchmarks used for metaphor detection and word embeddings extracted from different corpora and temporally aligned using different state-of-the-art approaches. The results suggest that the usage of different static word embedding methods does impact the metaphor detection task and some temporal word embeddings slightly outperform static methods. However, the results also suggest that temporal word embeddings may provide representations of the core meaning of the metaphor even too close to their contextual meaning, thus confusing the classifier. Overall, the interaction between temporal language evolution and metaphor detection appears tiny in the benchmark datasets used in our experiments. This suggests that future work for the computational analysis of this important linguistic phenomenon should first start by creating a new dataset where this interaction is better represented
Analyzing Gaussian distribution of semantic shifts in Lexical Semantic Change Models
In recent years, there has been a significant increase in interest in lexical semantic change
detection. Many are the existing approaches, data used, and evaluation strategies to detect
semantic shifts. The classification of change words against stable words requires thresholds to
label the degree of semantic change. In this work, we compare state-of-the-art computational
historical linguistics approaches to evaluate the efficacy of thresholds based on the Gaussian
Distribution of semantic shifts. We present the results of an in-depth analysis conducted on
both SemEval-2020 Task 1 Subtask 1 and DIACR-Ita tasks. Specifically, we compare Temporal
Random Indexing, Temporal Referencing, Orthogonal Procrustes Alignment, Dynamic Word
Embeddings and Temporal Word Embedding with a Compass. While results obtained with
Gaussian thresholds achieve state-of-the-art performance in English, German, Swedish and
Italian, they remain far from results obtained using the optimal threshold
A comparative study of approaches for the diachronic analysis of the Italian language
In recent years, there has been a significant increase in interest in lexical semantic change detection. Many are the existing approaches, data used, and evaluation strategies to detect semantic drift. Most of those approaches rely on diachronic word embeddings. Some of them are created as post-processing of static word embeddings, while others produce dynamic word embeddings where vectors share the same geometric space for all time slices. The large majority of the methods use English as the target language for the diachronic analysis, while other languages remain under-explored. In this work, we compare state-of-the-art approaches in computational historical linguistics to evaluate the pros and cons of each model, and we present the results of an in-depth analysis conducted using an Italian diachronic corpus. Specifically, several approaches based on both static embeddings and dynamic ones are implemented and evaluated by using the Kronos-It dataset. We train all word embeddings on the Italian Google n-gram corpus. The main result of the evaluation is that all approaches fail to significantly reduce the number of false-positive change points, which confirms that lexical semantic change is still a challenging task
DIACR-Ita @ EVALITA2020: Overview of the EVALITA2020 diachronic lexical semantics (DIACR-Ita) task
This paper describes the first edition of the “Diachronic Lexical Semantics” (DIACR-Ita) task at the EVALITA 2020 campaign. The task challenges participants to develop systems that can automatically detect if a given word has changed its meaning over time, given contextual information from corpora. The task, at its first edition, attracted 9 participant teams and collected a total of 36 submission runs
When Politicians Talk About Politics: Identifying Political Tweets of Brazilian Congressmen
Since June 2013, when Brazil faced the largest and most significant mass
protests in a generation, a political crisis is in course. In midst of this
crisis, Brazilian politicians use social media to communicate with the
electorate in order to retain or to grow their political capital. The problem
is that many controversial topics are in course and deputies may prefer to
avoid such themes in their messages. To characterize this behavior, we propose
a method to accurately identify political and non-political tweets
independently of the deputy who posted it and of the time it was posted.
Moreover, we collected tweets of all congressmen who were active on Twitter and
worked in the Brazilian parliament from October 2013 to October 2017. To
evaluate our method, we used word clouds and a topic model to identify the main
political and non-political latent topics in parliamentarian tweets. Both
results indicate that our proposal is able to accurately distinguish political
from non-political tweets. Moreover, our analyses revealed a striking fact:
more than half of the messages posted by Brazilian deputies are non-political.Comment: 4 pages, 7 figures, 2 table
- …