Search CORE

5 research outputs found

Domain-based user embedding for competing events on social media

Author: Sasahara Kazutoshi
Xu Wentao
Publication venue
Publication date: 30/08/2023
Field of study

Online social networks offer vast opportunities for computational social science, but effective user embedding is crucial for downstream tasks. Traditionally, researchers have used pre-defined network-based user features, such as degree, and centrality measures, and/or content-based features, such as posts and reposts. However, these measures may not capture the complex characteristics of social media users. In this study, we propose a user embedding method based on the URL domain co-occurrence network, which is simple but effective for representing social media users in competing events. We assessed the performance of this method in binary classification tasks using benchmark datasets that included Twitter users related to COVID-19 infodemic topics (QAnon, Biden, Ivermectin). Our results revealed that user embeddings generated directly from the retweet network, and those based on language, performed below expectations. In contrast, our domain-based embeddings outperformed these methods while reducing computation time. These findings suggest that the domain-based user embedding can serve as an effective tool to characterize social media users participating in competing events, such as political campaigns and public health crises.Comment: Computational social science applicatio

arXiv.org e-Print Archive

A Google trends spatial clustering approach for a worldwide Twitter user geolocation

Author: Cortez Paulo
Ragno Costantino
Zola Paola
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

User location data is valuable for diverse social media analytics. In this paper, we address the non-trivial task of estimating a worldwide city-level Twitter user location considering only historical tweets. We propose a purely unsupervised approach that is based on a synthetic geographic sampling of Google Trends (GT) city-level frequencies of tweet nouns and three clustering algorithms. The approach was validated empirically by using a recently collected dataset, with 3,268 worldwide city-level locations of Twitter users, obtaining competitive results when compared with a state-of-the-art Word Distribution (WD) user location estimation method. The best overall results were achieved by the GT noun DBSCAN (GTN-DB) method, which is computationally fast, and correctly predicts the ground truth locations of 15%, 23%, 39% and 58% of the users for tolerance distances of 250 km, 500 km, 1,000 km and 2,000 km.The work of P. Cortez was supported by FCT – Funda ̧c ̃ao para a Ciˆencia eTecnologia within the R&D Units Project Scope: UIDB/00319/2020. We wouldalso like to thank the anonymous reviewers for their helpful suggestions

Universidade do Minho: RepositoriUM