11,359 research outputs found

    Making the Most of Tweet-Inherent Features for Social Spam Detection on Twitter

    Get PDF
    Social spam produces a great amount of noise on social media services such as Twitter, which reduces the signal-to-noise ratio that both end users and data mining applications observe. Existing techniques on social spam detection have focused primarily on the identification of spam accounts by using extensive historical and network-based data. In this paper we focus on the detection of spam tweets, which optimises the amount of data that needs to be gathered by relying only on tweet-inherent features. This enables the application of the spam detection system to a large set of tweets in a timely fashion, potentially applicable in a real-time or near real-time setting. Using two large hand-labelled datasets of tweets containing spam, we study the suitability of five classification algorithms and four different feature sets to the social spam detection task. Our results show that, by using the limited set of features readily available in a tweet, we can achieve encouraging results which are competitive when compared against existing spammer detection systems that make use of additional, costly user features. Our study is the first that attempts at generalising conclusions on the optimal classifiers and sets of features for social spam detection over different datasets

    Early detection of spam-related activity

    Get PDF
    Spam, the distribution of unsolicited bulk email, is a big security threat on the Internet. Recent studies show approximately 70-90% of the worldwide email traffic—about 70 billion messages a day—is spam. Spam consumes resources on the network and at mail servers, and it is also used to launch other attacks on users, such as distributing malware or phishing. Spammers have increased their virulence and resilience by sending spam from large collections of compromised machines (“botnets”). Spammers also make heavy use of URLs and domains to direct victims to point-of-sale Web sites, and miscreants register large number of domains to evade blacklisting efforts. To mitigate the threat of spam, users and network administrators need proactive techniques to distinguish spammers from legitimate senders and to take down online spam-advertised sites. In this dissertation, we focus on characterizing spam-related activities and developing systems to detect them early. Our work builds on the observation that spammers need to acquire attack agility to be profitable, which presents differences in how spammers and legitimate users interact with Internet services and exposes detectable during early period of attack. We examine several important components across the spam life cycle, including spam dissemination that aims to reach users' inboxes, the hosting process during which spammers set DNS servers and Web servers, and the naming process to acquire domain names via registration services. We first develop a new spam-detection system based on network-level features of spamming bots. These lightweight features allow the system to scale better and to be more robust. Next, we analyze DNS resource records and lookups from top-level domain servers during the initial stage after domain registrations, which provides a global view across the Internet to characterize spam hosting infrastructure. We further examine the domain registration process and present the unique registration behavior of spammers. Finally, we build an early-warning system to identify spammer domains at time-of-registration rather than later at time-of-use. We have demonstrated that our detection systems are effective by using real-world datasets. Our work has also had practical impact. Some of the network-level features that we identified have since been incorporated into spam filtering products at Yahoo! and McAfee, and our work on detecting spammer domains at time-of-registration has directly influenced new projects at Verisign to investigate domain registrations.Ph.D
    • …
    corecore