20 research outputs found

    Organized Behavior Classification of Tweet Sets using Supervised Learning Methods

    Full text link
    During the 2016 US elections Twitter experienced unprecedented levels of propaganda and fake news through the collaboration of bots and hired persons, the ramifications of which are still being debated. This work proposes an approach to identify the presence of organized behavior in tweets. The Random Forest, Support Vector Machine, and Logistic Regression algorithms are each used to train a model with a data set of 850 records consisting of 299 features extracted from tweets gathered during the 2016 US presidential election. The features represent user and temporal synchronization characteristics to capture coordinated behavior. These models are trained to classify tweet sets among the categories: organic vs organized, political vs non-political, and pro-Trump vs pro-Hillary vs neither. The random forest algorithm performs better with greater than 95% average accuracy and f-measure scores for each category. The most valuable features for classification are identified as user based features, with media use and marking tweets as favorite to be the most dominant.Comment: 51 pages, 5 figure

    Detecting and Tracking the Spread of Astroturf Memes in Microblog Streams

    Full text link
    Online social media are complementing and in some cases replacing person-to-person social interaction and redefining the diffusion of information. In particular, microblogs have become crucial grounds on which public relations, marketing, and political battles are fought. We introduce an extensible framework that will enable the real-time analysis of meme diffusion in social media by mining, visualizing, mapping, classifying, and modeling massive streams of public microblogging events. We describe a Web service that leverages this framework to track political memes in Twitter and help detect astroturfing, smear campaigns, and other misinformation in the context of U.S. political elections. We present some cases of abusive behaviors uncovered by our service. Finally, we discuss promising preliminary results on the detection of suspicious memes via supervised learning based on features extracted from the topology of the diffusion networks, sentiment analysis, and crowdsourced annotations

    Characterizing and modeling the dynamics of online popularity

    Full text link
    Online popularity has enormous impact on opinions, culture, policy, and profits. We provide a quantitative, large scale, temporal analysis of the dynamics of online content popularity in two massive model systems, the Wikipedia and an entire country's Web space. We find that the dynamics of popularity are characterized by bursts, displaying characteristic features of critical systems such as fat-tailed distributions of magnitude and inter-event time. We propose a minimal model combining the classic preferential popularity increase mechanism with the occurrence of random popularity shifts due to exogenous factors. The model recovers the critical features observed in the empirical analysis of the systems analyzed here, highlighting the key factors needed in the description of popularity dynamics.Comment: 5 pages, 4 figures. Modeling part detailed. Final version published in Physical Review Letter

    Twitter as health information source : exploring the parameters affecting dementia-related tweets

    Get PDF
    Unlike other media, research on the credibility of information present on social media is limited. This limitation is even more pronounced in the case of healthcare, including dementia-related information. The purpose of this study was to identify user groups that show high bot-like behavior and profile features that deviation from typical human behavior. We collected 16,691 tweets about dementia posted over the course of a month by 8400 users. We applied inductive coding to categorize users. The BotOrNot? API was used to compute a bot score. This work provides insight into relations between user features and a bot score. We performed analysis techniques such as Kruskal-Wallis, stepwise multiple variable regression, user tweet frequency analysis and content analysis on the data. These were further evaluated for the most frequently referenced URLs in the tweets and most active users in terms of tweet frequency. Initial results indicated that the majority of users are regular users and not bots. Regression analysis revealed a clear relationship between different features. Independent variables in the user profiles such as geo_data and favourites_count, correlated with the final bot score. Similarly, content analysis of the tweets showed that the word features of bot profiles have an overall smaller percentage of words compared to regular profiles. Although this analysis is promising, it needs further enhancements

    Designing Ethical Phishing Experiments: A study of (ROT13) rOnl query features

    No full text
    ABSTRACT We study how to design experiments to measure the success rates of phishing attacks that are ethical and accurate, which are two requirements of contradictory forces. Namely, an ethical experiment must not expose the participants to any risk; it should be possible to locally verify by the participants or representatives thereof that this was the case. At the same time, an experiment is accurate if it is possible to argue why its success rate is not an upper or lower bound of that of a real attack -this may be difficult if the ethics considerations make the user perception of the experiment different from the user perception of the attack. We introduce several experimental techniques allowing us to achieve a balance between these two requirements, and demonstrate how to apply these, using a context aware phishing experiment on a popular online auction site which we call "rOnl". Our experiments exhibit a measured average yield of 11% per collection of unique users. This study was authorized by the Human Subjects Committee at Indiana University (Study #05-10306)

    Evolutionary Sentence Combination for Chatterbots

    No full text
    conversation. They make use of various techniques such as pattern matching, indexing, sentence reconstruction, and even natural language processing. In this paper we present an approach to chatterbots that mixes pattern matching with indexing and query matching methods inspired by information retrieval. We propose a model in which new sentences can be produced from existing ones using an evolutionary algorithm adapted to the structure of the natural language

    Detecting and Tracking Political Abuse in Social Media

    No full text
    We study astroturf political campaigns on microblogging platforms: politically-motivated individuals and organizations that use multiple centrally-controlled accounts to create the appearance of widespread support for a candidate or opinion. We describe a machine learning framework that combines topological, content-based and crowdsourced features of information diffusion networks on Twitter to detect the early stages of viral spreading of political misinformation.  We present promising preliminary results with better than 96% accuracy in the detection of astroturf content in the run-up to the 2010 U.S. midterm elections
    corecore