965 research outputs found

    Making the Most of Tweet-Inherent Features for Social Spam Detection on Twitter

    Get PDF
    Social spam produces a great amount of noise on social media services such as Twitter, which reduces the signal-to-noise ratio that both end users and data mining applications observe. Existing techniques on social spam detection have focused primarily on the identification of spam accounts by using extensive historical and network-based data. In this paper we focus on the detection of spam tweets, which optimises the amount of data that needs to be gathered by relying only on tweet-inherent features. This enables the application of the spam detection system to a large set of tweets in a timely fashion, potentially applicable in a real-time or near real-time setting. Using two large hand-labelled datasets of tweets containing spam, we study the suitability of five classification algorithms and four different feature sets to the social spam detection task. Our results show that, by using the limited set of features readily available in a tweet, we can achieve encouraging results which are competitive when compared against existing spammer detection systems that make use of additional, costly user features. Our study is the first that attempts at generalising conclusions on the optimal classifiers and sets of features for social spam detection over different datasets

    Seminar Users in the Arabic Twitter Sphere

    Full text link
    We introduce the notion of "seminar users", who are social media users engaged in propaganda in support of a political entity. We develop a framework that can identify such users with 84.4% precision and 76.1% recall. While our dataset is from the Arab region, omitting language-specific features has only a minor impact on classification performance, and thus, our approach could work for detecting seminar users in other parts of the world and in other languages. We further explored a controversial political topic to observe the prevalence and potential potency of such users. In our case study, we found that 25% of the users engaged in the topic are in fact seminar users and their tweets make nearly a third of the on-topic tweets. Moreover, they are often successful in affecting mainstream discourse with coordinated hashtag campaigns.Comment: to appear in SocInfo 201

    Enabling Semantics-Aware Collaborative Tagging and Social Search in an Open Interoperable Tagosphere

    Get PDF
    To make the most of a global network effect and to search and filter the Long Tail, a collaborative tagging approach to social search should be based on the global activity of tagging, rating and filtering. We take a further step towards this objective by proposing a shared conceptualization of both the activity of tagging and the organization of the tagosphere in which tagging takes place. We also put forward the necessary data standards to interoperate at both data format and semantic levels. We highlight how this conceptualization makes provision for attaching identity and meaning to tags and tag categorization through a Wikipedia-based collaborative framework. Used together, these concepts are a useful and agile means of unambiguously defining terms used during tagging, and of clarifying any vague search terms. This improves search results in terms of recall and precision, and represents an innovative means of semantics-aware collaborative filtering and content ranking

    Spammer Detection on Online Social Networks

    Get PDF
    Twitter with its rising popularity as a micro-blogging website has inevitably attracted attention of spammers. Spammers use myriad of techniques to lure victims into clicking malicious URLs. In this thesis, we present several novel features capable of distinguishing spam accounts from legitimate accounts in real-time. The features exploit the behavioral and content entropy, bait-techniques, community-orientation, and profile characteristics of spammers. We then use supervised learning algorithms to generate models using the proposed features and show that our tool, spAmbush, can detect spammers in real-time. Our analysis reveals detection of more than 90% of spammers with less than five tweets and more than half with only a single tweet. Our feature computation has low latency and resource requirement. Our results show a 96% detection rate with only 0.01% false positive rate. We further cluster the unknown spammers to identify and understand the prevalent spam campaigns on Twitter
    corecore