2,231 research outputs found
Making the Most of Tweet-Inherent Features for Social Spam Detection on Twitter
Social spam produces a great amount of noise on social media services such as
Twitter, which reduces the signal-to-noise ratio that both end users and data
mining applications observe. Existing techniques on social spam detection have
focused primarily on the identification of spam accounts by using extensive
historical and network-based data. In this paper we focus on the detection of
spam tweets, which optimises the amount of data that needs to be gathered by
relying only on tweet-inherent features. This enables the application of the
spam detection system to a large set of tweets in a timely fashion, potentially
applicable in a real-time or near real-time setting. Using two large
hand-labelled datasets of tweets containing spam, we study the suitability of
five classification algorithms and four different feature sets to the social
spam detection task. Our results show that, by using the limited set of
features readily available in a tweet, we can achieve encouraging results which
are competitive when compared against existing spammer detection systems that
make use of additional, costly user features. Our study is the first that
attempts at generalising conclusions on the optimal classifiers and sets of
features for social spam detection over different datasets
POISED: Spotting Twitter Spam Off the Beaten Paths
Cybercriminals have found in online social networks a propitious medium to
spread spam and malicious content. Existing techniques for detecting spam
include predicting the trustworthiness of accounts and analyzing the content of
these messages. However, advanced attackers can still successfully evade these
defenses.
Online social networks bring people who have personal connections or share
common interests to form communities. In this paper, we first show that users
within a networked community share some topics of interest. Moreover, content
shared on these social network tend to propagate according to the interests of
people. Dissemination paths may emerge where some communities post similar
messages, based on the interests of those communities. Spam and other malicious
content, on the other hand, follow different spreading patterns.
In this paper, we follow this insight and present POISED, a system that
leverages the differences in propagation between benign and malicious messages
on social networks to identify spam and other unwanted content. We test our
system on a dataset of 1.3M tweets collected from 64K users, and we show that
our approach is effective in detecting malicious messages, reaching 91%
precision and 93% recall. We also show that POISED's detection is more
comprehensive than previous systems, by comparing it to three state-of-the-art
spam detection systems that have been proposed by the research community in the
past. POISED significantly outperforms each of these systems. Moreover, through
simulations, we show how POISED is effective in the early detection of spam
messages and how it is resilient against two well-known adversarial machine
learning attacks
Social Fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling
Spambot detection in online social networks is a long-lasting challenge
involving the study and design of detection techniques capable of efficiently
identifying ever-evolving spammers. Recently, a new wave of social spambots has
emerged, with advanced human-like characteristics that allow them to go
undetected even by current state-of-the-art algorithms. In this paper, we show
that efficient spambots detection can be achieved via an in-depth analysis of
their collective behaviors exploiting the digital DNA technique for modeling
the behaviors of social network users. Inspired by its biological counterpart,
in the digital DNA representation the behavioral lifetime of a digital account
is encoded in a sequence of characters. Then, we define a similarity measure
for such digital DNA sequences. We build upon digital DNA and the similarity
between groups of users to characterize both genuine accounts and spambots.
Leveraging such characterization, we design the Social Fingerprinting
technique, which is able to discriminate among spambots and genuine accounts in
both a supervised and an unsupervised fashion. We finally evaluate the
effectiveness of Social Fingerprinting and we compare it with three
state-of-the-art detection algorithms. Among the peculiarities of our approach
is the possibility to apply off-the-shelf DNA analysis techniques to study
online users behaviors and to efficiently rely on a limited number of
lightweight account characteristics
Measuring, Characterizing, and Detecting Facebook Like Farms
Social networks offer convenient ways to seamlessly reach out to large
audiences. In particular, Facebook pages are increasingly used by businesses,
brands, and organizations to connect with multitudes of users worldwide. As the
number of likes of a page has become a de-facto measure of its popularity and
profitability, an underground market of services artificially inflating page
likes, aka like farms, has emerged alongside Facebook's official targeted
advertising platform. Nonetheless, there is little work that systematically
analyzes Facebook pages' promotion methods. Aiming to fill this gap, we present
a honeypot-based comparative measurement study of page likes garnered via
Facebook advertising and from popular like farms. First, we analyze likes based
on demographic, temporal, and social characteristics, and find that some farms
seem to be operated by bots and do not really try to hide the nature of their
operations, while others follow a stealthier approach, mimicking regular users'
behavior. Next, we look at fraud detection algorithms currently deployed by
Facebook and show that they do not work well to detect stealthy farms which
spread likes over longer timespans and like popular pages to mimic regular
users. To overcome their limitations, we investigate the feasibility of
timeline-based detection of like farm accounts, focusing on characterizing
content generated by Facebook accounts on their timelines as an indicator of
genuine versus fake social activity. We analyze a range of features, grouped
into two main categories: lexical and non-lexical. We find that like farm
accounts tend to re-share content, use fewer words and poorer vocabulary, and
more often generate duplicate comments and likes compared to normal users.
Using relevant lexical and non-lexical features, we build a classifier to
detect like farms accounts that achieves precision higher than 99% and 93%
recall.Comment: To appear in ACM Transactions on Privacy and Security (TOPS
Comparative Studies of Detecting Abusive Language on Twitter
The context-dependent nature of online aggression makes annotating large
collections of data extremely difficult. Previously studied datasets in abusive
language detection have been insufficient in size to efficiently train deep
learning models. Recently, Hate and Abusive Speech on Twitter, a dataset much
greater in size and reliability, has been released. However, this dataset has
not been comprehensively studied to its potential. In this paper, we conduct
the first comparative study of various learning models on Hate and Abusive
Speech on Twitter, and discuss the possibility of using additional features and
context data for improvements. Experimental results show that bidirectional GRU
networks trained on word-level features, with Latent Topic Clustering modules,
is the most accurate model scoring 0.805 F1.Comment: ALW2: 2nd Workshop on Abusive Language Online to be held at EMNLP
2018 (Brussels, Belgium), October 31st, 201
- …