20 research outputs found
Organized Behavior Classification of Tweet Sets using Supervised Learning Methods
During the 2016 US elections Twitter experienced unprecedented levels of
propaganda and fake news through the collaboration of bots and hired persons,
the ramifications of which are still being debated. This work proposes an
approach to identify the presence of organized behavior in tweets. The Random
Forest, Support Vector Machine, and Logistic Regression algorithms are each
used to train a model with a data set of 850 records consisting of 299 features
extracted from tweets gathered during the 2016 US presidential election. The
features represent user and temporal synchronization characteristics to capture
coordinated behavior. These models are trained to classify tweet sets among the
categories: organic vs organized, political vs non-political, and pro-Trump vs
pro-Hillary vs neither. The random forest algorithm performs better with
greater than 95% average accuracy and f-measure scores for each category. The
most valuable features for classification are identified as user based
features, with media use and marking tweets as favorite to be the most
dominant.Comment: 51 pages, 5 figure
Detecting and Tracking the Spread of Astroturf Memes in Microblog Streams
Online social media are complementing and in some cases replacing
person-to-person social interaction and redefining the diffusion of
information. In particular, microblogs have become crucial grounds on which
public relations, marketing, and political battles are fought. We introduce an
extensible framework that will enable the real-time analysis of meme diffusion
in social media by mining, visualizing, mapping, classifying, and modeling
massive streams of public microblogging events. We describe a Web service that
leverages this framework to track political memes in Twitter and help detect
astroturfing, smear campaigns, and other misinformation in the context of U.S.
political elections. We present some cases of abusive behaviors uncovered by
our service. Finally, we discuss promising preliminary results on the detection
of suspicious memes via supervised learning based on features extracted from
the topology of the diffusion networks, sentiment analysis, and crowdsourced
annotations
Characterizing and modeling the dynamics of online popularity
Online popularity has enormous impact on opinions, culture, policy, and
profits. We provide a quantitative, large scale, temporal analysis of the
dynamics of online content popularity in two massive model systems, the
Wikipedia and an entire country's Web space. We find that the dynamics of
popularity are characterized by bursts, displaying characteristic features of
critical systems such as fat-tailed distributions of magnitude and inter-event
time. We propose a minimal model combining the classic preferential popularity
increase mechanism with the occurrence of random popularity shifts due to
exogenous factors. The model recovers the critical features observed in the
empirical analysis of the systems analyzed here, highlighting the key factors
needed in the description of popularity dynamics.Comment: 5 pages, 4 figures. Modeling part detailed. Final version published
in Physical Review Letter
Twitter as health information source : exploring the parameters affecting dementia-related tweets
Unlike other media, research on the credibility of information present on social media is limited. This limitation is even more pronounced in the case of healthcare, including dementia-related information. The purpose of this study was to identify user groups that show high bot-like behavior and profile features that deviation from typical human behavior. We collected 16,691 tweets about dementia posted over the course of a month by 8400 users. We applied inductive coding to categorize users. The BotOrNot? API was used to compute a bot score. This work provides insight into relations between user features and a bot score. We performed analysis techniques such as Kruskal-Wallis, stepwise multiple variable regression, user tweet frequency analysis and content analysis on the data. These were further evaluated for the most frequently referenced URLs in the tweets and most active users in terms of tweet frequency. Initial results indicated that the majority of users are regular users and not bots. Regression analysis revealed a clear relationship between different features. Independent variables in the user profiles such as geo_data and favourites_count, correlated with the final bot score. Similarly, content analysis of the tweets showed that the word features of bot profiles have an overall smaller percentage of words compared to regular profiles. Although this analysis is promising, it needs further enhancements
Designing Ethical Phishing Experiments: A study of (ROT13) rOnl query features
ABSTRACT We study how to design experiments to measure the success rates of phishing attacks that are ethical and accurate, which are two requirements of contradictory forces. Namely, an ethical experiment must not expose the participants to any risk; it should be possible to locally verify by the participants or representatives thereof that this was the case. At the same time, an experiment is accurate if it is possible to argue why its success rate is not an upper or lower bound of that of a real attack -this may be difficult if the ethics considerations make the user perception of the experiment different from the user perception of the attack. We introduce several experimental techniques allowing us to achieve a balance between these two requirements, and demonstrate how to apply these, using a context aware phishing experiment on a popular online auction site which we call "rOnl". Our experiments exhibit a measured average yield of 11% per collection of unique users. This study was authorized by the Human Subjects Committee at Indiana University (Study #05-10306)
Evolutionary Sentence Combination for Chatterbots
conversation. They make use of various techniques such as pattern matching, indexing, sentence reconstruction, and even natural language processing. In this paper we present an approach to chatterbots that mixes pattern matching with indexing and query matching methods inspired by information retrieval. We propose a model in which new sentences can be produced from existing ones using an evolutionary algorithm adapted to the structure of the natural language
Detecting and Tracking Political Abuse in Social Media
We study astroturf political campaigns on microblogging platforms: politically-motivated individuals and organizations that use multiple centrally-controlled accounts to create the appearance of widespread support for a candidate or opinion. We describe a machine learning framework that combines topological, content-based and crowdsourced features of information diffusion networks on Twitter to detect the early stages of viral spreading of political misinformation. We present promising preliminary results with better than 96% accuracy in the detection of astroturf content in the run-up to the 2010 U.S. midterm elections