5 research outputs found
Domain-based user embedding for competing events on social media
Online social networks offer vast opportunities for computational social
science, but effective user embedding is crucial for downstream tasks.
Traditionally, researchers have used pre-defined network-based user features,
such as degree, and centrality measures, and/or content-based features, such as
posts and reposts. However, these measures may not capture the complex
characteristics of social media users. In this study, we propose a user
embedding method based on the URL domain co-occurrence network, which is simple
but effective for representing social media users in competing events. We
assessed the performance of this method in binary classification tasks using
benchmark datasets that included Twitter users related to COVID-19 infodemic
topics (QAnon, Biden, Ivermectin). Our results revealed that user embeddings
generated directly from the retweet network, and those based on language,
performed below expectations. In contrast, our domain-based embeddings
outperformed these methods while reducing computation time. These findings
suggest that the domain-based user embedding can serve as an effective tool to
characterize social media users participating in competing events, such as
political campaigns and public health crises.Comment: Computational social science applicatio
A Google trends spatial clustering approach for a worldwide Twitter user geolocation
User location data is valuable for diverse social media analytics. In this paper, we address the non-trivial task of estimating a worldwide city-level Twitter user location considering only historical tweets. We propose a purely unsupervised approach that is based on a synthetic geographic sampling of Google Trends (GT) city-level frequencies of tweet nouns and three clustering algorithms. The approach was validated empirically by using a recently collected dataset, with 3,268 worldwide city-level locations of Twitter users, obtaining competitive results when compared with a state-of-the-art Word Distribution (WD) user location estimation method. The best overall results were achieved by the GT noun DBSCAN (GTN-DB) method, which is computationally fast, and correctly predicts the ground truth locations of 15%, 23%, 39% and 58% of the users for tolerance distances of 250 km, 500 km, 1,000 km and 2,000 km.The work of P. Cortez was supported by FCT – Funda ̧c ̃ao para a Ciˆencia eTecnologia within the R&D Units Project Scope: UIDB/00319/2020. We wouldalso like to thank the anonymous reviewers for their helpful suggestions