141 research outputs found

    Making the Most of Tweet-Inherent Features for Social Spam Detection on Twitter

    Get PDF
    Social spam produces a great amount of noise on social media services such as Twitter, which reduces the signal-to-noise ratio that both end users and data mining applications observe. Existing techniques on social spam detection have focused primarily on the identification of spam accounts by using extensive historical and network-based data. In this paper we focus on the detection of spam tweets, which optimises the amount of data that needs to be gathered by relying only on tweet-inherent features. This enables the application of the spam detection system to a large set of tweets in a timely fashion, potentially applicable in a real-time or near real-time setting. Using two large hand-labelled datasets of tweets containing spam, we study the suitability of five classification algorithms and four different feature sets to the social spam detection task. Our results show that, by using the limited set of features readily available in a tweet, we can achieve encouraging results which are competitive when compared against existing spammer detection systems that make use of additional, costly user features. Our study is the first that attempts at generalising conclusions on the optimal classifiers and sets of features for social spam detection over different datasets

    Support Efficient, Scalable, and Online Social Spam Detection in System

    Get PDF
    The broad success of online social networks (OSNs) has created fertile soil for the emergence and fast spread of social spam. Fake news, malicious URL links, fraudulent advertisements, fake reviews, and biased propaganda are bringing serious consequences for both virtual social networks and human life in the real world. Effectively detecting social spam is a hot topic in both academia and industry. However, traditional social spam detection techniques are limited to centralized processing on top of one specific data source but ignore the social spam correlations of distributed data sources. Moreover, a few research efforts are conducting in integrating the stream system (e.g., Storm, Spark) with the large-scale social spam detection, but they typically ignore the specific details in managing and recovering interim states during the social stream data processing. We observed that social spammers who aim to advertise their products or post victim links are more frequently spreading malicious posts during a very short period of time. They are quite smart to adapt themselves to old models that were trained based on historical records. Therefore, these bring a question: how can we uncover and defend against these online spam activities in an online and scalable manner? In this dissertation, we present there systems that support scalable and online social spam detection from streaming social data: (1) the first part introduces Oases, a scalable system that can support large-scale online social spam detection, (2) the second part introduces a system named SpamHunter, a novel system that supports efficient online scalable spam detection in social networks. The system gives novel insights in guaranteeing the efficiency of the modern stream applications by leveraging the spam correlations at scale, and (3) the third part refers to the state recovery during social spam detection, it introduces a customizable state recovery framework that provides fast and scalable state recovery mechanisms for protecting large distributed states in social spam detection applications

    Virtual Celebrator Machine

    Get PDF
    There has been a huge growth of social network in the recent years. This trend does not only allow us to get connected and share the information in an efficient way, but also reveals some potential beneficial in dealing with several social issues, such as earthquake detection, social spam detection, flu pandemic tracking, media monitoring, etc. In this paper, we propose a new way of utilizing social network. By implementing what is called a Virtual Celebrator Machine (VCM), we are able to let everyone who has connection with this machine in term of social networking be able to share their cultural experience and points of view about certain social events locally or globally. In that way, we provide a way to reinforce the relationship and connection between people virtually, which, we believe, would help to flourish cultural heritage preservation

    Sentiment Analysis of Long-term Social Data during the COVID-19 Pandemic

    Get PDF
    The COVID-19 pandemic has bringing the “infodemic” in the social media worlds. Various social platforms play a significant role in instantly acquiring the latest updates of the pandemic. Social media such as Twitter and Facebook produce vast amounts of posts related to the virus, vaccines, economics, and politics. In order to figure out how public opinion and sentiments are expressed during the pandemic, this work analyzes the long-term social posts from social media and conducts sentiment analysis on tweets within 12 months. Our findings show the trend topics of long-term social communities during the pandemic and express people’s attitudes towards progress of major actions during the pandemic. We explore the main topics during the prolonged pandemic, including information surrounding economics, vaccines, and politics. Besides, we show the differences in gender-based attitudes and propose future research questions refer to the “infodemic”. We believe that our work contributes to attracting public attention to the “infodemic” of the social crisis

    A Critical Analysis Of The State-Of-The-Art On Automated Detection Of Deceptive Behavior In Social Media

    Get PDF
    Recently, a large body of research has been devoted to examine the user behavioral patterns and the business implications of social media. However, relatively little research has been conducted regarding users’ deceptive activities in social media; these deceptive activities may hinder the effective application of the data collected from social media to perform e-marketing and initiate business transformation in general. One of the main contributions of this paper is the critical analysis of the possible forms of deceptive behavior in social media and the state-of-the-art technologies for automated deception detection in social media. Based on the proposed taxonomy of major deception types, the assumptions, advantages, and disadvantages of the popular deception detection methods are analyzed. Our critical analysis shows that deceptive behavior may evolve over time, and so making it difficult for the existing methods to effectively detect social media spam. Accordingly, another main contribution of this paper is the design and development of a generic framework to combat dynamic deceptive activities in social media. The managerial implication of our research is that business managers or marketers will develop better insights about the possible deceptive behavior in social media before they tap into social media to collect and generate market intelligence. Moreover, they can apply the proposed adaptive deception detection framework to more effectively combat the ever increasing and evolving deceptive activities in social medi

    Incremental Information Gain Analysis of Input Attribute Impact on RBF-Kernel SVM Spam Detection

    Get PDF
    The massive increase of spam is posing a very serious threat to email and SMS, which have become an important means of communication. Not only do spams annoy users, but they also become a security threat. Machine learning techniques have been widely used for spam detection. Email spams can be detected through detecting senders’ behaviour, the contents of an email, subject and source address, etc, while SMS spam detection usually is based on the tokens or features of messages due to short content. However, a comprehensive analysis of email/SMS content may provide cures for users to aware of email/SMS spams. We cannot completely depend on automatic tools to identify all spams. In this paper, we propose an analysis approach based on information entropy and incremental learning to see how various features affect the performance of an RBF-based SVM spam detector, so that to increase our awareness of a spam by sensing the features of a spam. The experiments were carried out on the spambase and SMSSpemCollection databases in UCI machine learning repository. The results show that some features have significant impacts on spam detection, of which users should be aware, and there exists a feature space that achieves Pareto efficiency in True Positive Rate and True Negative Rate
    • …
    corecore