1,331 research outputs found
Online Human-Bot Interactions: Detection, Estimation, and Characterization
Increasing evidence suggests that a growing amount of social media content is
generated by autonomous entities known as social bots. In this work we present
a framework to detect such entities on Twitter. We leverage more than a
thousand features extracted from public data and meta-data about users:
friends, tweet content and sentiment, network patterns, and activity time
series. We benchmark the classification framework by using a publicly available
dataset of Twitter bots. This training data is enriched by a manually annotated
collection of active Twitter users that include both humans and bots of varying
sophistication. Our models yield high accuracy and agreement with each other
and can detect bots of different nature. Our estimates suggest that between 9%
and 15% of active Twitter accounts are bots. Characterizing ties among
accounts, we observe that simple bots tend to interact with bots that exhibit
more human-like behaviors. Analysis of content flows reveals retweet and
mention strategies adopted by bots to interact with different target groups.
Using clustering analysis, we characterize several subclasses of accounts,
including spammers, self promoters, and accounts that post content from
connected applications.Comment: Accepted paper for ICWSM'17, 10 pages, 8 figures, 1 tabl
Seminar Users in the Arabic Twitter Sphere
We introduce the notion of "seminar users", who are social media users
engaged in propaganda in support of a political entity. We develop a framework
that can identify such users with 84.4% precision and 76.1% recall. While our
dataset is from the Arab region, omitting language-specific features has only a
minor impact on classification performance, and thus, our approach could work
for detecting seminar users in other parts of the world and in other languages.
We further explored a controversial political topic to observe the prevalence
and potential potency of such users. In our case study, we found that 25% of
the users engaged in the topic are in fact seminar users and their tweets make
nearly a third of the on-topic tweets. Moreover, they are often successful in
affecting mainstream discourse with coordinated hashtag campaigns.Comment: to appear in SocInfo 201
Measuring, Characterizing, and Detecting Facebook Like Farms
Social networks offer convenient ways to seamlessly reach out to large
audiences. In particular, Facebook pages are increasingly used by businesses,
brands, and organizations to connect with multitudes of users worldwide. As the
number of likes of a page has become a de-facto measure of its popularity and
profitability, an underground market of services artificially inflating page
likes, aka like farms, has emerged alongside Facebook's official targeted
advertising platform. Nonetheless, there is little work that systematically
analyzes Facebook pages' promotion methods. Aiming to fill this gap, we present
a honeypot-based comparative measurement study of page likes garnered via
Facebook advertising and from popular like farms. First, we analyze likes based
on demographic, temporal, and social characteristics, and find that some farms
seem to be operated by bots and do not really try to hide the nature of their
operations, while others follow a stealthier approach, mimicking regular users'
behavior. Next, we look at fraud detection algorithms currently deployed by
Facebook and show that they do not work well to detect stealthy farms which
spread likes over longer timespans and like popular pages to mimic regular
users. To overcome their limitations, we investigate the feasibility of
timeline-based detection of like farm accounts, focusing on characterizing
content generated by Facebook accounts on their timelines as an indicator of
genuine versus fake social activity. We analyze a range of features, grouped
into two main categories: lexical and non-lexical. We find that like farm
accounts tend to re-share content, use fewer words and poorer vocabulary, and
more often generate duplicate comments and likes compared to normal users.
Using relevant lexical and non-lexical features, we build a classifier to
detect like farms accounts that achieves precision higher than 99% and 93%
recall.Comment: To appear in ACM Transactions on Privacy and Security (TOPS
Bots increase exposure to negative and inflammatory content in online social systems
Societies are complex systems which tend to polarize into sub-groups of
individuals with dramatically opposite perspectives. This phenomenon is
reflected -- and often amplified -- in online social networks where, however,
humans are no more the only players, and co-exist alongside with social bots,
i.e., software-controlled accounts. Analyzing large-scale social data collected
during the Catalan referendum for independence on October 1, 2017, consisting
of nearly 4 millions Twitter posts generated by almost 1 million users, we
identify the two polarized groups of Independentists and Constitutionalists and
quantify the structural and emotional roles played by social bots. We show that
bots act from peripheral areas of the social system to target influential
humans of both groups, bombarding Independentists with violent contents,
increasing their exposure to negative and inflammatory narratives and
exacerbating social conflict online. Our findings stress the importance of
developing countermeasures to unmask these forms of automated social
manipulation.Comment: 8 pages, 5 figure
Fame for sale: efficient detection of fake Twitter followers
are those Twitter accounts specifically created to
inflate the number of followers of a target account. Fake followers are
dangerous for the social platform and beyond, since they may alter concepts
like popularity and influence in the Twittersphere - hence impacting on
economy, politics, and society. In this paper, we contribute along different
dimensions. First, we review some of the most relevant existing features and
rules (proposed by Academia and Media) for anomalous Twitter accounts
detection. Second, we create a baseline dataset of verified human and fake
follower accounts. Such baseline dataset is publicly available to the
scientific community. Then, we exploit the baseline dataset to train a set of
machine-learning classifiers built over the reviewed rules and features. Our
results show that most of the rules proposed by Media provide unsatisfactory
performance in revealing fake followers, while features proposed in the past by
Academia for spam detection provide good results. Building on the most
promising features, we revise the classifiers both in terms of reduction of
overfitting and cost for gathering the data needed to compute the features. The
final result is a novel classifier, general enough to thwart
overfitting, lightweight thanks to the usage of the less costly features, and
still able to correctly classify more than 95% of the accounts of the original
training set. We ultimately perform an information fusion-based sensitivity
analysis, to assess the global sensitivity of each of the features employed by
the classifier. The findings reported in this paper, other than being supported
by a thorough experimental methodology and interesting on their own, also pave
the way for further investigation on the novel issue of fake Twitter followers
- …