114 research outputs found
Better Safe Than Sorry: An Adversarial Approach to Improve Social Bot Detection
The arm race between spambots and spambot-detectors is made of several cycles
(or generations): a new wave of spambots is created (and new spam is spread),
new spambot filters are derived and old spambots mutate (or evolve) to new
species. Recently, with the diffusion of the adversarial learning approach, a
new practice is emerging: to manipulate on purpose target samples in order to
make stronger detection models. Here, we manipulate generations of Twitter
social bots, to obtain - and study - their possible future evolutions, with the
aim of eventually deriving more effective detection techniques. In detail, we
propose and experiment with a novel genetic algorithm for the synthesis of
online accounts. The algorithm allows to create synthetic evolved versions of
current state-of-the-art social bots. Results demonstrate that synthetic bots
really escape current detection techniques. However, they give all the needed
elements to improve such techniques, making possible a proactive approach for
the design of social bot detection systems.Comment: This is the pre-final version of a paper accepted @ 11th ACM
Conference on Web Science, June 30-July 3, 2019, Boston, U
A Decade of Social Bot Detection
On the morning of November 9th 2016, the world woke up to the shocking
outcome of the US Presidential elections: Donald Trump was the 45th President
of the United States of America. An unexpected event that still has tremendous
consequences all over the world. Today, we know that a minority of social bots,
automated social media accounts mimicking humans, played a central role in
spreading divisive messages and disinformation, possibly contributing to
Trump's victory. In the aftermath of the 2016 US elections, the world started
to realize the gravity of widespread deception in social media. Following
Trump's exploit, we witnessed to the emergence of a strident dissonance between
the multitude of efforts for detecting and removing bots, and the increasing
effects that these malicious actors seem to have on our societies. This paradox
opens a burning question: What strategies should we enforce in order to stop
this social bot pandemic? In these times, during the run-up to the 2020 US
elections, the question appears as more crucial than ever. What stroke social,
political and economic analysts after 2016, deception and automation, has been
however a matter of study for computer scientists since at least 2010. In this
work, we briefly survey the first decade of research in social bot detection.
Via a longitudinal analysis, we discuss the main trends of research in the
fight against bots, the major results that were achieved, and the factors that
make this never-ending battle so challenging. Capitalizing on lessons learned
from our extensive analysis, we suggest possible innovations that could give us
the upper hand against deception and manipulation. Studying a decade of
endeavours at social bot detection can also inform strategies for detecting and
mitigating the effects of other, more recent, forms of online deception, such
as strategic information operations and political trolls.Comment: Forthcoming in Communications of the AC
Cashtag piggybacking: uncovering spam and bot activity in stock microblogs on Twitter
Microblogs are increasingly exploited for predicting prices and traded
volumes of stocks in financial markets. However, it has been demonstrated that
much of the content shared in microblogging platforms is created and publicized
by bots and spammers. Yet, the presence (or lack thereof) and the impact of
fake stock microblogs has never systematically been investigated before. Here,
we study 9M tweets related to stocks of the 5 main financial markets in the US.
By comparing tweets with financial data from Google Finance, we highlight
important characteristics of Twitter stock microblogs. More importantly, we
uncover a malicious practice - referred to as cashtag piggybacking -
perpetrated by coordinated groups of bots and likely aimed at promoting
low-value stocks by exploiting the popularity of high-value ones. Among the
findings of our study is that as much as 71% of the authors of suspicious
financial tweets are classified as bots by a state-of-the-art spambot detection
algorithm. Furthermore, 37% of them were suspended by Twitter a few months
after our investigation. Our results call for the adoption of spam and bot
detection techniques in all studies and applications that exploit
user-generated content for predicting the stock market
The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race
Recent studies in social media spam and automation provide anecdotal
argumentation of the rise of a new generation of spambots, so-called social
spambots. Here, for the first time, we extensively study this novel phenomenon
on Twitter and we provide quantitative evidence that a paradigm-shift exists in
spambot design. First, we measure current Twitter's capabilities of detecting
the new social spambots. Later, we assess the human performance in
discriminating between genuine accounts, social spambots, and traditional
spambots. Then, we benchmark several state-of-the-art techniques proposed by
the academic literature. Results show that neither Twitter, nor humans, nor
cutting-edge applications are currently capable of accurately detecting the new
social spambots. Our results call for new approaches capable of turning the
tide in the fight against this raising phenomenon. We conclude by reviewing the
latest literature on spambots detection and we highlight an emerging common
research trend based on the analysis of collective behaviors. Insights derived
from both our extensive experimental campaign and survey shed light on the most
promising directions of research and lay the foundations for the arms race
against the novel social spambots. Finally, to foster research on this novel
phenomenon, we make publicly available to the scientific community all the
datasets used in this study.Comment: To appear in Proc. 26th WWW, 2017, Companion Volume (Web Science
Track, Perth, Australia, 3-7 April, 2017
DNA-inspired online behavioral modeling and its application to spambot detection
We propose a strikingly novel, simple, and effective approach to model online
user behavior: we extract and analyze digital DNA sequences from user online
actions and we use Twitter as a benchmark to test our proposal. We obtain an
incisive and compact DNA-inspired characterization of user actions. Then, we
apply standard DNA analysis techniques to discriminate between genuine and
spambot accounts on Twitter. An experimental campaign supports our proposal,
showing its effectiveness and viability. To the best of our knowledge, we are
the first ones to identify and adapt DNA-inspired techniques to online user
behavioral modeling. While Twitter spambot detection is a specific use case on
a specific social media, our proposed methodology is platform and technology
agnostic, hence paving the way for diverse behavioral characterization tasks
Social Fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling
Spambot detection in online social networks is a long-lasting challenge
involving the study and design of detection techniques capable of efficiently
identifying ever-evolving spammers. Recently, a new wave of social spambots has
emerged, with advanced human-like characteristics that allow them to go
undetected even by current state-of-the-art algorithms. In this paper, we show
that efficient spambots detection can be achieved via an in-depth analysis of
their collective behaviors exploiting the digital DNA technique for modeling
the behaviors of social network users. Inspired by its biological counterpart,
in the digital DNA representation the behavioral lifetime of a digital account
is encoded in a sequence of characters. Then, we define a similarity measure
for such digital DNA sequences. We build upon digital DNA and the similarity
between groups of users to characterize both genuine accounts and spambots.
Leveraging such characterization, we design the Social Fingerprinting
technique, which is able to discriminate among spambots and genuine accounts in
both a supervised and an unsupervised fashion. We finally evaluate the
effectiveness of Social Fingerprinting and we compare it with three
state-of-the-art detection algorithms. Among the peculiarities of our approach
is the possibility to apply off-the-shelf DNA analysis techniques to study
online users behaviors and to efficiently rely on a limited number of
lightweight account characteristics
Progettazione, sviluppo e valutazione di un'architettura per applicazioni di Early Warning in un contesto di Social Sensing
Progettazione, sviluppo e valutazione di un'architettura per applicazioni di Early Warning in un contesto di Social Sensin
Fame for sale: efficient detection of fake Twitter followers
are those Twitter accounts specifically created to
inflate the number of followers of a target account. Fake followers are
dangerous for the social platform and beyond, since they may alter concepts
like popularity and influence in the Twittersphere - hence impacting on
economy, politics, and society. In this paper, we contribute along different
dimensions. First, we review some of the most relevant existing features and
rules (proposed by Academia and Media) for anomalous Twitter accounts
detection. Second, we create a baseline dataset of verified human and fake
follower accounts. Such baseline dataset is publicly available to the
scientific community. Then, we exploit the baseline dataset to train a set of
machine-learning classifiers built over the reviewed rules and features. Our
results show that most of the rules proposed by Media provide unsatisfactory
performance in revealing fake followers, while features proposed in the past by
Academia for spam detection provide good results. Building on the most
promising features, we revise the classifiers both in terms of reduction of
overfitting and cost for gathering the data needed to compute the features. The
final result is a novel classifier, general enough to thwart
overfitting, lightweight thanks to the usage of the less costly features, and
still able to correctly classify more than 95% of the accounts of the original
training set. We ultimately perform an information fusion-based sensitivity
analysis, to assess the global sensitivity of each of the features employed by
the classifier. The findings reported in this paper, other than being supported
by a thorough experimental methodology and interesting on their own, also pave
the way for further investigation on the novel issue of fake Twitter followers
Social Media for the Common Good: the case of EARS
Natural disasters such as earthquakes, tornadoes and floods, are just some of the devastating events that may have catastrophic consequences on wide geographical areas. A quick and targeted response to emergencies greatly contributes in mitigating the losses. In recent years we have witnessed to many situations in which crowds of volunteer citizens have helped emergency responders via the use of widespread social media. Here we argue that technology can help in supporting the population, as well as the decision makers, by introducing tools that enhance the collective awareness level, providing quick yet accurate insights into the unfolding emergency. In this short paper we introduce the EARS system, a social media-based system that supports decision makers during earthquake emergencies in Italy. We discuss the implications and the responsibility related to the usage of such systems by the decision makers. Also, we discuss on how publicly opening systems like EARS to the population might change the problem approach and we introduce relevant opportunities and issues that this solution would imply.
A Linguistically-driven Approach to Cross-Event Damage Assessment of Natural Disasters from Social Media Messages
This work focuses on the analysis of Italian social media messages for disaster management and aims at the detection of messages carrying critical information for the damage assessment task. A main novelty of this study consists in the focus on out-domain and cross-event damage detection, and on the investigation of the most relevant tweet-derived features for these tasks. We devised different experiments by resorting to a wide set of linguistic features qualifying the lexical and grammatical structure of a text as well as ad-hoc features specifically implemented for this task. We investigated the most effective features that allow to achieve the best results. A further result of this study is the construction of the first manually annotated Italian corpus of social media messages for damage assessment
- …