6 research outputs found

    Agenda detector: labeling tweets with political policy agenda

    Get PDF
    In nearly one decade of Twitter’s being it has witnessed an ever growing user base from various realms of the world, one of them being politics. In the political domain, Twitter is used as a vital tool for communication purposes, running effective e-campaigns, and mining and affecting public opinions to name a few. We study the problem of automatically detecting whether a tweet posted by a state’s Senate’s twitter handle in the US has a reference to policy agenda(s). Such a capability can help detect the policy agendas that a state focuses on and also capture the inception of ideas leading to framing of bill/law. Furthermore, analyzing the spatial and temporal dynamics of tweets carrying policy agendas can facilitate study of policy diffusion among states, and help in comprehending the changing aspects of states learning policy-making from each other. Currently, no study has been carried out that analyzes Twitter data to detect whether or not a tweet refers to a policy agenda. We present our analysis on 122,965 tweets collected from verified Twitter handles of the US state’s upper house – Senate. We present our high-level analysis on (a) how much Twitter has penetrated into state politics and (b) how states use the medium differently in terms of the messages they broadcast. Our proposed approach aims to automate classification of a tweet based on having a reference to policy agenda (Has Agenda) or not (No Agenda). We accomplish this by leveraging existing text classification methodology and achieve a recall of 89.1% and precision of 77.2% for the “Has Agenda” class. We investigate several machine learning algorithms to determine the best performing one for our binary classification problem. We conclude that support vector machine using linear kernel was the most efficient algorithm to use for our dataset. Lastly, we propose a set of hand-crafted features that together with feature selection and stemming improved our classifier’s performance. Prior to including these features the classifier was developed using, basic preprocessing techniques, and term occurrence (for feature extraction). An overall improvement of 5.187 % at a significance level of α=0.05 was achieved

    Detecting collective attention spam

    Full text link
    We examine the problem of collective attention spam, in which spammers target social media where user attention quickly coalesces and then collectively focuses around a phe-nomenon. Compared to many existing spam types, collec-tive attention spam relies on the users themselves to seek out the content – like breaking news, viral videos, and popular memes – where the spam will be encountered, potentially in-creasing its effectiveness and reach. We study the presence of collective attention spam in one popular service, Twitter, and we develop spam classifiers to detect spam messages generated by collective attention spammers. Since many in-stances of collective attention are bursty and unexpected, it is difficult to build spam detectors to pre-screen them before they arise; hence, we examine the effectiveness of quickly learning a classifier based on the first moments of a bursting phenomenon. Through initial experiments over a small set of trending topics on Twitter, we find encouraging results, suggesting that collective attention spam may be identified early in its life cycle and shielded from the view of unsus-pecting social media users

    Promotional Campaigns in the Era of Social Platforms

    Get PDF
    The rise of social media has facilitated the diffusion of information to more easily reach millions of users. While some users connect with friends and organically share information and opinions on social media, others have exploited these platforms to gain influence and profit through promotional campaigns and advertising. The existence of promotional campaigns contributes to the spread of misleading information, spam, and fake news. Thus, these campaigns affect the trustworthiness and reliability of social media and render it as a crowd advertising platform. This dissertation studies the existence of promotional campaigns in social media and explores different ways users and bots (i.e. automated accounts) engage in such campaigns. In this dissertation, we design a suite of detection, ranking, and mining techniques. We study user-generated reviews in online e-commerce sites, such as Google Play, to extract campaigns. We identify cooperating sets of bots and classify their interactions in social networks such as Twitter, and rank the bots based on the degree of their malevolence. Our study shows that modern online social interactions are largely modulated by promotional campaigns such as political campaigns, advertisement campaigns, and incentive-driven campaigns. We measure how these campaigns can potentially impact information consumption of millions of social media users

    Combating Threats to the Quality of Information in Social Systems

    Get PDF
    Many large-scale social systems such as Web-based social networks, online social media sites and Web-scale crowdsourcing systems have been growing rapidly, enabling millions of human participants to generate, share and consume content on a massive scale. This reliance on users can lead to many positive effects, including large-scale growth in the size and content in the community, bottom-up discovery of “citizen-experts”, serendipitous discovery of new resources beyond the scope of the system designers, and new social-based information search and retrieval algorithms. But the relative openness and reliance on users coupled with the widespread interest and growth of these social systems carries risks and raises growing concerns over the quality of information in these systems. In this dissertation research, we focus on countering threats to the quality of information in self-managing social systems. Concretely, we identify three classes of threats to these systems: (i) content pollution by social spammers, (ii) coordinated campaigns for strategic manipulation, and (iii) threats to collective attention. To combat these threats, we propose three inter-related methods for detecting evidence of these threats, mitigating their impact, and improving the quality of information in social systems. We augment this three-fold defense with an exploration of their origins in “crowdturfing” – a sinister counterpart to the enormous positive opportunities of crowdsourcing. In particular, this dissertation research makes four unique contributions: • The first contribution of this dissertation research is a framework for detecting and filtering social spammers and content polluters in social systems. To detect and filter individual social spammers and content polluters, we propose and evaluate a novel social honeypot-based approach. • Second, we present a set of methods and algorithms for detecting coordinated campaigns in large-scale social systems. We propose and evaluate a content- driven framework for effectively linking free text posts with common “talking points” and extracting campaigns from large-scale social systems. • Third, we present a dual study of the robustness of social systems to collective attention threats through both a data-driven modeling approach and deploy- ment over a real system trace. We evaluate the effectiveness of countermeasures deployed based on the first moments of a bursting phenomenon in a real system. • Finally, we study the underlying ecosystem of crowdturfing for engaging in each of the three threat types. We present a framework for “pulling back the curtain” on crowdturfers to reveal their underlying ecosystem on both crowdsourcing sites and social media

    Content-Driven Detection of Campaigns in Social Media

    No full text
    We study the problem of detecting coordinated free text campaigns in large-scale social media. These campaigns – ranging from coordinated spam messages to promotional and advertising campaigns to political astro-turfing – are growing in significance and reach with the commensurate rise of massive-scale social systems. Often linked by common “talking points”, there has been little research in detecting these campaigns. Hence, we propose and evaluate a contentdriven framework for effectively linking free text posts with common “talking points ” and extracting campaigns from large-scale social media. One of the salient aspects of the framework is an investigation of graph mining techniques for isolating coherent campaigns from large message-based graphs. Through an experimental study over millions of Twitter messages we identify five major types of campaigns – Spam, Promotion, Template, News, and Celebrity campaigns – and we show how these campaigns may be extracted with high precision and recall. Categories and Subject Descriptors: H.3.5 [Online Information Services]: Web-based services; J.4 [Computer Applications]: Social and behavioral science
    corecore