103 research outputs found
Temporal word embeddings for dynamic user profiling in Twitter
The research described in this paper focused on exploring
the domain of user profiling, a nascent and contentious technology which
has been steadily attracting increased interest from the research community as its potential for providing personalised digital services is realised.
An extensive review of related literature revealed that limited research
has been conducted into how temporal aspects of users can be captured
using user profiling techniques. This, coupled with the notable lack of
research into the use of word embedding techniques to capture temporal
variances in language, revealed an opportunity to extend the Random Indexing word embedding technique such that the interests of users could
be modelled based on their use of language. To achieve this, this work
concerned itself with extending an existing implementation of Temporal
Random Indexing to model Twitter users across multiple granularities of
time based on their use of language. The product of this is a novel technique for temporal user profiling, where a set of vectors is used to describe
the evolution of a Twitter user’s interests over time through their use of
language. The vectors produced were evaluated against a temporal implementation of another state-of-the-art word embedding technique, the
Word2Vec Dynamic Independent Skip-gram model, where it was found
that Temporal Random Indexing outperformed Word2Vec in the generation of temporal user profiles
GM-CTSC at SemEval-2020 Task 1: Gaussian Mixtures Cross Temporal Similarity Clustering
This paper describes the system proposed for the SemEval-2020 Task 1:
Unsupervised Lexical Semantic Change Detection. We focused our approach on the
detection problem. Given the semantics of words captured by temporal word
embeddings in different time periods, we investigate the use of unsupervised
methods to detect when the target word has gained or loosed senses. To this
end, we defined a new algorithm based on Gaussian Mixture Models to cluster the
target similarities computed over the two periods. We compared the proposed
approach with a number of similarity-based thresholds. We found that, although
the performance of the detection methods varies across the word embedding
algorithms, the combination of Gaussian Mixture with Temporal Referencing
resulted in our best system
Entity Linking for the Semantic Annotation of Italian Tweets
Linking entity mentions in Italian tweets to concepts in a knowledge base is a challenging task, due to the short and noisy nature of these short messages and the lack of specific resources for Italian. This paper proposes an adaptation of a general purpose Named Entity Linking algorithm, which exploits the similarity measure computed over a Distributional Semantic Model, in the context of Italian tweets. In order to evaluate the proposed algorithm, we introduce a new dataset of tweets for entity linking that we have developed specifically for the Italian language
Temporal Random Indexing: A System for Analysing Word Meaning over Time
During the last decade the surge in available data spanning different epochs has inspired a new analysis of cultural, social, and linguistic phenomena from a temporal perspective. This paper describes a method that enables the analysis of the time evolution of the meaning of a word. We propose Temporal Random Indexing (TRI), a method for building WordSpaces that takes into account temporal information. We exploit this methodology in order to build geometrical spaces of word meanings that consider several periods of time. The TRI framework provides all the necessary tools to build WordSpaces over different time periods and perform such temporal linguistic analysis. We propose some examples of usage of our tool by analysing word meanings in two corpora: a collection of Italian books and English scientific papers about computational linguistics. This analysis enables the detection of linguistic events that emerge in specific time intervals and that can be related to social or cultural phenomena
- …