82 research outputs found
Better Safe Than Sorry: An Adversarial Approach to Improve Social Bot Detection
The arm race between spambots and spambot-detectors is made of several cycles
(or generations): a new wave of spambots is created (and new spam is spread),
new spambot filters are derived and old spambots mutate (or evolve) to new
species. Recently, with the diffusion of the adversarial learning approach, a
new practice is emerging: to manipulate on purpose target samples in order to
make stronger detection models. Here, we manipulate generations of Twitter
social bots, to obtain - and study - their possible future evolutions, with the
aim of eventually deriving more effective detection techniques. In detail, we
propose and experiment with a novel genetic algorithm for the synthesis of
online accounts. The algorithm allows to create synthetic evolved versions of
current state-of-the-art social bots. Results demonstrate that synthetic bots
really escape current detection techniques. However, they give all the needed
elements to improve such techniques, making possible a proactive approach for
the design of social bot detection systems.Comment: This is the pre-final version of a paper accepted @ 11th ACM
Conference on Web Science, June 30-July 3, 2019, Boston, U
A Decade of Social Bot Detection
On the morning of November 9th 2016, the world woke up to the shocking
outcome of the US Presidential elections: Donald Trump was the 45th President
of the United States of America. An unexpected event that still has tremendous
consequences all over the world. Today, we know that a minority of social bots,
automated social media accounts mimicking humans, played a central role in
spreading divisive messages and disinformation, possibly contributing to
Trump's victory. In the aftermath of the 2016 US elections, the world started
to realize the gravity of widespread deception in social media. Following
Trump's exploit, we witnessed to the emergence of a strident dissonance between
the multitude of efforts for detecting and removing bots, and the increasing
effects that these malicious actors seem to have on our societies. This paradox
opens a burning question: What strategies should we enforce in order to stop
this social bot pandemic? In these times, during the run-up to the 2020 US
elections, the question appears as more crucial than ever. What stroke social,
political and economic analysts after 2016, deception and automation, has been
however a matter of study for computer scientists since at least 2010. In this
work, we briefly survey the first decade of research in social bot detection.
Via a longitudinal analysis, we discuss the main trends of research in the
fight against bots, the major results that were achieved, and the factors that
make this never-ending battle so challenging. Capitalizing on lessons learned
from our extensive analysis, we suggest possible innovations that could give us
the upper hand against deception and manipulation. Studying a decade of
endeavours at social bot detection can also inform strategies for detecting and
mitigating the effects of other, more recent, forms of online deception, such
as strategic information operations and political trolls.Comment: Forthcoming in Communications of the AC
Cashtag piggybacking: uncovering spam and bot activity in stock microblogs on Twitter
Microblogs are increasingly exploited for predicting prices and traded
volumes of stocks in financial markets. However, it has been demonstrated that
much of the content shared in microblogging platforms is created and publicized
by bots and spammers. Yet, the presence (or lack thereof) and the impact of
fake stock microblogs has never systematically been investigated before. Here,
we study 9M tweets related to stocks of the 5 main financial markets in the US.
By comparing tweets with financial data from Google Finance, we highlight
important characteristics of Twitter stock microblogs. More importantly, we
uncover a malicious practice - referred to as cashtag piggybacking -
perpetrated by coordinated groups of bots and likely aimed at promoting
low-value stocks by exploiting the popularity of high-value ones. Among the
findings of our study is that as much as 71% of the authors of suspicious
financial tweets are classified as bots by a state-of-the-art spambot detection
algorithm. Furthermore, 37% of them were suspended by Twitter a few months
after our investigation. Our results call for the adoption of spam and bot
detection techniques in all studies and applications that exploit
user-generated content for predicting the stock market
DNA-inspired online behavioral modeling and its application to spambot detection
We propose a strikingly novel, simple, and effective approach to model online
user behavior: we extract and analyze digital DNA sequences from user online
actions and we use Twitter as a benchmark to test our proposal. We obtain an
incisive and compact DNA-inspired characterization of user actions. Then, we
apply standard DNA analysis techniques to discriminate between genuine and
spambot accounts on Twitter. An experimental campaign supports our proposal,
showing its effectiveness and viability. To the best of our knowledge, we are
the first ones to identify and adapt DNA-inspired techniques to online user
behavioral modeling. While Twitter spambot detection is a specific use case on
a specific social media, our proposed methodology is platform and technology
agnostic, hence paving the way for diverse behavioral characterization tasks
Social Fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling
Spambot detection in online social networks is a long-lasting challenge
involving the study and design of detection techniques capable of efficiently
identifying ever-evolving spammers. Recently, a new wave of social spambots has
emerged, with advanced human-like characteristics that allow them to go
undetected even by current state-of-the-art algorithms. In this paper, we show
that efficient spambots detection can be achieved via an in-depth analysis of
their collective behaviors exploiting the digital DNA technique for modeling
the behaviors of social network users. Inspired by its biological counterpart,
in the digital DNA representation the behavioral lifetime of a digital account
is encoded in a sequence of characters. Then, we define a similarity measure
for such digital DNA sequences. We build upon digital DNA and the similarity
between groups of users to characterize both genuine accounts and spambots.
Leveraging such characterization, we design the Social Fingerprinting
technique, which is able to discriminate among spambots and genuine accounts in
both a supervised and an unsupervised fashion. We finally evaluate the
effectiveness of Social Fingerprinting and we compare it with three
state-of-the-art detection algorithms. Among the peculiarities of our approach
is the possibility to apply off-the-shelf DNA analysis techniques to study
online users behaviors and to efficiently rely on a limited number of
lightweight account characteristics
A Linguistically-driven Approach to Cross-Event Damage Assessment of Natural Disasters from Social Media Messages
This work focuses on the analysis of Italian social media messages for disaster management and aims at the detection of messages carrying critical information for the damage assessment task. A main novelty of this study consists in the focus on out-domain and cross-event damage detection, and on the investigation of the most relevant tweet-derived features for these tasks. We devised different experiments by resorting to a wide set of linguistic features qualifying the lexical and grammatical structure of a text as well as ad-hoc features specifically implemented for this task. We investigated the most effective features that allow to achieve the best results. A further result of this study is the construction of the first manually annotated Italian corpus of social media messages for damage assessment
Crisis Mapping during Natural Disasters via Text Analysis of Social Media Messages
Recent disasters demonstrated the central role of social media during emergencies thus motivating the exploitation of such data for crisis mapping. We propose a crisis mapping system that addresses limitations of current state-of-the-art approaches by analyzing the textual content of disaster reports from a twofold perspective. A damage detection component employs a SVM classifier to detect mentions of damage among emergency reports. A novel geoparsing technique is proposed and used to perform message geolocation. We report on a case study to show how the information extracted through damage detection and message geolocation can be combined to produce accurate crisis maps. Our crisis maps clearly detect both highly and lightly damaged areas, thus opening up the possibility to prioritize rescue efforts where they are most needed
- …