6,883 research outputs found
DNA-inspired online behavioral modeling and its application to spambot detection
We propose a strikingly novel, simple, and effective approach to model online
user behavior: we extract and analyze digital DNA sequences from user online
actions and we use Twitter as a benchmark to test our proposal. We obtain an
incisive and compact DNA-inspired characterization of user actions. Then, we
apply standard DNA analysis techniques to discriminate between genuine and
spambot accounts on Twitter. An experimental campaign supports our proposal,
showing its effectiveness and viability. To the best of our knowledge, we are
the first ones to identify and adapt DNA-inspired techniques to online user
behavioral modeling. While Twitter spambot detection is a specific use case on
a specific social media, our proposed methodology is platform and technology
agnostic, hence paving the way for diverse behavioral characterization tasks
Seminar Users in the Arabic Twitter Sphere
We introduce the notion of "seminar users", who are social media users
engaged in propaganda in support of a political entity. We develop a framework
that can identify such users with 84.4% precision and 76.1% recall. While our
dataset is from the Arab region, omitting language-specific features has only a
minor impact on classification performance, and thus, our approach could work
for detecting seminar users in other parts of the world and in other languages.
We further explored a controversial political topic to observe the prevalence
and potential potency of such users. In our case study, we found that 25% of
the users engaged in the topic are in fact seminar users and their tweets make
nearly a third of the on-topic tweets. Moreover, they are often successful in
affecting mainstream discourse with coordinated hashtag campaigns.Comment: to appear in SocInfo 201
Quantum inspired approach for early classification of time series
Is it possible to apply some fundamental principles of quantum-computing to time series classi\ufb01cation algorithms? This is the initial spark that became the research question I decided to chase at the very beginning of my PhD studies. The idea came accidentally after reading a note on the ability of entanglement to express the correlation between two particles, even far away from each other. The test problem was also at hand because I was investigating on possible algorithms for real time bot detection, a challenging problem at present day, by means of statistical approaches for sequential classi\ufb01cation. The quantum inspired algorithm presented in this thesis stemmed as an evolution of the statistical method mentioned above: it is a novel approach to address binary and multinomial classi\ufb01cation of an incoming data stream, inspired by the principles of Quantum Computing, in order to ensure the shortest decision time with high accuracy. The proposed approach exploits the analogy between the intrinsic correlation of two or more particles and the dependence of each item in a data stream with the preceding ones. Starting from the a-posteriori probability of each item to belong to a particular class, we can assign a Qubit state representing a combination of the aforesaid probabilities for all available observations of the time series. By leveraging superposition and entanglement on subsequences of growing length, it is possible to devise a measure of membership to each class, thus enabling the system to take a reliable decision when a suf\ufb01cient level of con\ufb01dence is met. In order to provide an extensive and thorough analysis of the problem, a well-\ufb01tting approach for bot detection was replicated on our dataset and later compared with the statistical algorithm to determine the best option. The winner was subsequently examined against the new quantum-inspired proposal, showing the superior capability of the latter in both binary and multinomial classi\ufb01cation of data streams. The validation of quantum-inspired approach in a synthetically generated use case, completes the research framework and opens new perspectives in on-the-\ufb02y time series classi\ufb01cation, that we have just started to explore. Just to name a few ones, the algorithm is currently being tested with encouraging results in predictive maintenance and prognostics for automotive, in collaboration with University of Bradford (UK), and in action recognition from video streams
Deep Neural Networks for Bot Detection
The problem of detecting bots, automated social media accounts governed by
software but disguising as human users, has strong implications. For example,
bots have been used to sway political elections by distorting online discourse,
to manipulate the stock market, or to push anti-vaccine conspiracy theories
that caused health epidemics. Most techniques proposed to date detect bots at
the account level, by processing large amount of social media posts, and
leveraging information from network structure, temporal dynamics, sentiment
analysis, etc.
In this paper, we propose a deep neural network based on contextual long
short-term memory (LSTM) architecture that exploits both content and metadata
to detect bots at the tweet level: contextual features are extracted from user
metadata and fed as auxiliary input to LSTM deep nets processing the tweet
text.
Another contribution that we make is proposing a technique based on synthetic
minority oversampling to generate a large labeled dataset, suitable for deep
nets training, from a minimal amount of labeled data (roughly 3,000 examples of
sophisticated Twitter bots). We demonstrate that, from just one single tweet,
our architecture can achieve high classification accuracy (AUC > 96%) in
separating bots from humans.
We apply the same architecture to account-level bot detection, achieving
nearly perfect classification accuracy (AUC > 99%). Our system outperforms
previous state of the art while leveraging a small and interpretable set of
features yet requiring minimal training data
- …