801 research outputs found

    Unsupervised User Stance Detection on Twitter

    Full text link
    We present a highly effective unsupervised framework for detecting the stance of prolific Twitter users with respect to controversial topics. In particular, we use dimensionality reduction to project users onto a low-dimensional space, followed by clustering, which allows us to find core users that are representative of the different stances. Our framework has three major advantages over pre-existing methods, which are based on supervised or semi-supervised classification. First, we do not require any prior labeling of users: instead, we create clusters, which are much easier to label manually afterwards, e.g., in a matter of seconds or minutes instead of hours. Second, there is no need for domain- or topic-level knowledge either to specify the relevant stances (labels) or to conduct the actual labeling. Third, our framework is robust in the face of data skewness, e.g., when some users or some stances have greater representation in the data. We experiment with different combinations of user similarity features, dataset sizes, dimensionality reduction methods, and clustering algorithms to ascertain the most effective and most computationally efficient combinations across three different datasets (in English and Turkish). We further verified our results on additional tweet sets covering six different controversial topics. Our best combination in terms of effectiveness and efficiency uses retweeted accounts as features, UMAP for dimensionality reduction, and Mean Shift for clustering, and yields a small number of high-quality user clusters, typically just 2--3, with more than 98\% purity. The resulting user clusters can be used to train downstream classifiers. Moreover, our framework is robust to variations in the hyper-parameter values and also with respect to random initialization

    Online Human-Bot Interactions: Detection, Estimation, and Characterization

    Full text link
    Increasing evidence suggests that a growing amount of social media content is generated by autonomous entities known as social bots. In this work we present a framework to detect such entities on Twitter. We leverage more than a thousand features extracted from public data and meta-data about users: friends, tweet content and sentiment, network patterns, and activity time series. We benchmark the classification framework by using a publicly available dataset of Twitter bots. This training data is enriched by a manually annotated collection of active Twitter users that include both humans and bots of varying sophistication. Our models yield high accuracy and agreement with each other and can detect bots of different nature. Our estimates suggest that between 9% and 15% of active Twitter accounts are bots. Characterizing ties among accounts, we observe that simple bots tend to interact with bots that exhibit more human-like behaviors. Analysis of content flows reveals retweet and mention strategies adopted by bots to interact with different target groups. Using clustering analysis, we characterize several subclasses of accounts, including spammers, self promoters, and accounts that post content from connected applications.Comment: Accepted paper for ICWSM'17, 10 pages, 8 figures, 1 tabl

    Scalable Privacy-Compliant Virality Prediction on Twitter

    Get PDF
    The digital town hall of Twitter becomes a preferred medium of communication for individuals and organizations across the globe. Some of them reach audiences of millions, while others struggle to get noticed. Given the impact of social media, the question remains more relevant than ever: how to model the dynamics of attention in Twitter. Researchers around the world turn to machine learning to predict the most influential tweets and authors, navigating the volume, velocity, and variety of social big data, with many compromises. In this paper, we revisit content popularity prediction on Twitter. We argue that strict alignment of data acquisition, storage and analysis algorithms is necessary to avoid the common trade-offs between scalability, accuracy and privacy compliance. We propose a new framework for the rapid acquisition of large-scale datasets, high accuracy supervisory signal and multilanguage sentiment prediction while respecting every privacy request applicable. We then apply a novel gradient boosting framework to achieve state-of-the-art results in virality ranking, already before including tweet's visual or propagation features. Our Gradient Boosted Regression Tree is the first to offer explainable, strong ranking performance on benchmark datasets. Since the analysis focused on features available early, the model is immediately applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective Content Analysi

    Three Facets of Online Political Networks: Communities, Antagonisms, and Polarization

    Get PDF
    abstract: Millions of users leave digital traces of their political engagements on social media platforms every day. Users form networks of interactions, produce textual content, like and share each others' content. This creates an invaluable opportunity to better understand the political engagements of internet users. In this proposal, I present three algorithmic solutions to three facets of online political networks; namely, detection of communities, antagonisms and the impact of certain types of accounts on political polarization. First, I develop a multi-view community detection algorithm to find politically pure communities. I find that word usage among other content types (i.e. hashtags, URLs) complement user interactions the best in accurately detecting communities. Second, I focus on detecting negative linkages between politically motivated social media users. Major social media platforms do not facilitate their users with built-in negative interaction options. However, many political network analysis tasks rely on not only positive but also negative linkages. Here, I present the SocLSFact framework to detect negative linkages among social media users. It utilizes three pieces of information; sentiment cues of textual interactions, positive interactions, and socially balanced triads. I evaluate the contribution of each three aspects in negative link detection performance on multiple tasks. Third, I propose an experimental setup that quantifies the polarization impact of automated accounts on Twitter retweet networks. I focus on a dataset of tragic Parkland shooting event and its aftermath. I show that when automated accounts are removed from the retweet network the network polarization decrease significantly, while a same number of accounts to the automated accounts are removed randomly the difference is not significant. I also find that prominent predictors of engagement of automatically generated content is not very different than what previous studies point out in general engaging content on social media. Last but not least, I identify accounts which self-disclose their automated nature in their profile by using expressions such as bot, chat-bot, or robot. I find that human engagement to self-disclosing accounts compared to non-disclosing automated accounts is much smaller. This observational finding can motivate further efforts into automated account detection research to prevent their unintended impact.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Viewpoint Discovery and Understanding in Social Networks

    Full text link
    The Web has evolved to a dominant platform where everyone has the opportunity to express their opinions, to interact with other users, and to debate on emerging events happening around the world. On the one hand, this has enabled the presence of different viewpoints and opinions about a - usually controversial - topic (like Brexit), but at the same time, it has led to phenomena like media bias, echo chambers and filter bubbles, where users are exposed to only one point of view on the same topic. Therefore, there is the need for methods that are able to detect and explain the different viewpoints. In this paper, we propose a graph partitioning method that exploits social interactions to enable the discovery of different communities (representing different viewpoints) discussing about a controversial topic in a social network like Twitter. To explain the discovered viewpoints, we describe a method, called Iterative Rank Difference (IRD), which allows detecting descriptive terms that characterize the different viewpoints as well as understanding how a specific term is related to a viewpoint (by detecting other related descriptive terms). The results of an experimental evaluation showed that our approach outperforms state-of-the-art methods on viewpoint discovery, while a qualitative analysis of the proposed IRD method on three different controversial topics showed that IRD provides comprehensive and deep representations of the different viewpoints

    A Probabilistic Model for Malicious User and Rumor Detection on Social Media

    Get PDF
    Rumor detection in recent years has emerged as an important research topic, as fake news on social media now has more significant impacts on people\u27s lives, especially during complex and controversial events. Most existing rumor detection techniques, however, only provide shallow analyses of users who propagate rumors. In this paper, we propose a probabilistic model that describes user maliciousness with a two-sided perception of rumors and true stories. We model not only the behavior of retweeting rumors, but also the intention. We propose learning algorithms for discovering latent attributes and detecting rumors based on such attributes, supposedly more effectively when the stories involve retweets with mixed intentions. Using real-world rumor datasets, we show that our approach can outperform existing methods in detecting rumors, especially for more confusing stories. We also show that our approach can capture malicious users more effectively

    Detection of Trending Topic Communities: Bridging Content Creators and Distributors

    Full text link
    The rise of a trending topic on Twitter or Facebook leads to the temporal emergence of a set of users currently interested in that topic. Given the temporary nature of the links between these users, being able to dynamically identify communities of users related to this trending topic would allow for a rapid spread of information. Indeed, individual users inside a community might receive recommendations of content generated by the other users, or the community as a whole could receive group recommendations, with new content related to that trending topic. In this paper, we tackle this challenge, by identifying coherent topic-dependent user groups, linking those who generate the content (creators) and those who spread this content, e.g., by retweeting/reposting it (distributors). This is a novel problem on group-to-group interactions in the context of recommender systems. Analysis on real-world Twitter data compare our proposal with a baseline approach that considers the retweeting activity, and validate it with standard metrics. Results show the effectiveness of our approach to identify communities interested in a topic where each includes content creators and content distributors, facilitating users' interactions and the spread of new information.Comment: 9 pages, 4 figures, 2 tables, Hypertext 2017 conferenc

    Identifying Users with Opposing Opinions in Twitter Debates

    Full text link
    In recent times, social media sites such as Twitter have been extensively used for debating politics and public policies. These debates span millions of tweets and numerous topics of public importance. Thus, it is imperative that this vast trove of data is tapped in order to gain insights into public opinion especially on hotly contested issues such as abortion, gun reforms etc. Thus, in our work, we aim to gauge users' stance on such topics in Twitter. We propose ReLP, a semi-supervised framework using a retweet-based label propagation algorithm coupled with a supervised classifier to identify users with differing opinions. In particular, our framework is designed such that it can be easily adopted to different domains with little human supervision while still producing excellent accuracyComment: Corrected typos in Section 4, under "Visibly Opinionated Users". The numbers did not add up. Results remain unchange
    corecore