965 research outputs found
Making the Most of Tweet-Inherent Features for Social Spam Detection on Twitter
Social spam produces a great amount of noise on social media services such as
Twitter, which reduces the signal-to-noise ratio that both end users and data
mining applications observe. Existing techniques on social spam detection have
focused primarily on the identification of spam accounts by using extensive
historical and network-based data. In this paper we focus on the detection of
spam tweets, which optimises the amount of data that needs to be gathered by
relying only on tweet-inherent features. This enables the application of the
spam detection system to a large set of tweets in a timely fashion, potentially
applicable in a real-time or near real-time setting. Using two large
hand-labelled datasets of tweets containing spam, we study the suitability of
five classification algorithms and four different feature sets to the social
spam detection task. Our results show that, by using the limited set of
features readily available in a tweet, we can achieve encouraging results which
are competitive when compared against existing spammer detection systems that
make use of additional, costly user features. Our study is the first that
attempts at generalising conclusions on the optimal classifiers and sets of
features for social spam detection over different datasets
Seminar Users in the Arabic Twitter Sphere
We introduce the notion of "seminar users", who are social media users
engaged in propaganda in support of a political entity. We develop a framework
that can identify such users with 84.4% precision and 76.1% recall. While our
dataset is from the Arab region, omitting language-specific features has only a
minor impact on classification performance, and thus, our approach could work
for detecting seminar users in other parts of the world and in other languages.
We further explored a controversial political topic to observe the prevalence
and potential potency of such users. In our case study, we found that 25% of
the users engaged in the topic are in fact seminar users and their tweets make
nearly a third of the on-topic tweets. Moreover, they are often successful in
affecting mainstream discourse with coordinated hashtag campaigns.Comment: to appear in SocInfo 201
Enabling Semantics-Aware Collaborative Tagging and Social Search in an Open Interoperable Tagosphere
To make the most of a global network effect and to search and filter the Long Tail, a collaborative tagging approach to social search should be based on the global activity of tagging, rating and filtering. We take a further step towards this objective by proposing a shared conceptualization of both the activity of tagging and the organization of the tagosphere in which tagging takes place. We also put forward the necessary data standards to interoperate at both data format and semantic levels. We highlight how this conceptualization makes provision for attaching identity and meaning to tags and tag categorization through a Wikipedia-based collaborative framework. Used together, these concepts are a useful and agile means of unambiguously defining terms used during tagging, and of clarifying any vague search terms. This improves search results in terms of recall and precision, and represents an innovative means of semantics-aware collaborative filtering and content ranking
Spammer Detection on Online Social Networks
Twitter with its rising popularity as a micro-blogging website has inevitably attracted attention of spammers. Spammers use myriad of techniques to lure victims into clicking malicious URLs. In this thesis, we present several novel features capable of distinguishing spam accounts from legitimate accounts in real-time. The features exploit the behavioral and content entropy, bait-techniques, community-orientation, and profile characteristics of spammers. We then use supervised learning algorithms to generate models using the proposed features and show that our tool, spAmbush, can detect spammers in real-time. Our analysis reveals detection of more than 90% of spammers with less than five tweets and more than half with only a single tweet. Our feature computation has low latency and resource requirement. Our results show a 96% detection rate with only 0.01% false positive rate. We further cluster the unknown spammers to identify and understand the prevalent spam campaigns on Twitter
- …