718 research outputs found

    Social media bot detection with deep learning methods: a systematic review

    Get PDF
    Social bots are automated social media accounts governed by software and controlled by humans at the backend. Some bots have good purposes, such as automatically posting information about news and even to provide help during emergencies. Nevertheless, bots have also been used for malicious purposes, such as for posting fake news or rumour spreading or manipulating political campaigns. There are existing mechanisms that allow for detection and removal of malicious bots automatically. However, the bot landscape changes as the bot creators use more sophisticated methods to avoid being detected. Therefore, new mechanisms for discerning between legitimate and bot accounts are much needed. Over the past few years, a few review studies contributed to the social media bot detection research by presenting a comprehensive survey on various detection methods including cutting-edge solutions like machine learning (ML)/deep learning (DL) techniques. This paper, to the best of our knowledge, is the first one to only highlight the DL techniques and compare the motivation/effectiveness of these techniques among themselves and over other methods, especially the traditional ML ones. We present here a refined taxonomy of the features used in DL studies and details about the associated pre-processing strategies required to make suitable training data for a DL model. We summarize the gaps addressed by the review papers that mentioned about DL/ML studies to provide future directions in this field. Overall, DL techniques turn out to be computation and time efficient techniques for social bot detection with better or compatible performance as traditional ML techniques

    Macro-micro approach for mining public sociopolitical opinion from social media

    Get PDF
    During the past decade, we have witnessed the emergence of social media, which has prominence as a means for the general public to exchange opinions towards a broad range of topics. Furthermore, its social and temporal dimensions make it a rich resource for policy makers and organisations to understand public opinion. In this thesis, we present our research in understanding public opinion on Twitter along three dimensions: sentiment, topics and summary. In the first line of our work, we study how to classify public sentiment on Twitter. We focus on the task of multi-target-specific sentiment recognition on Twitter, and propose an approach which utilises the syntactic information from parse-tree in conjunction with the left-right context of the target. We show the state-of-the-art performance on two datasets including a multi-target Twitter corpus on UK elections which we make public available for the research community. Additionally we also conduct two preliminary studies including cross-domain emotion classification on discourse around arts and cultural experiences, and social spam detection to improve the signal-to-noise ratio of our sentiment corpus. Our second line of work focuses on automatic topical clustering of tweets. Our aim is to group tweets into a number of clusters, with each cluster representing a meaningful topic, story, event or a reason behind a particular choice of sentiment. We explore various ways of tackling this challenge and propose a two-stage hierarchical topic modelling system that is efficient and effective in achieving our goal. Lastly, for our third line of work, we study the task of summarising tweets on common topics, with the goal to provide informative summaries for real-world events/stories or explanation underlying the sentiment expressed towards an issue/entity. As most existing tweet summarisation approaches rely on extractive methods, we propose to apply state-of-the-art neural abstractive summarisation model for tweets. We also tackle the challenge of cross-medium supervised summarisation with no target-medium training resources. To the best of our knowledge, there is no existing work on studying neural abstractive summarisation on tweets. In addition, we present a system for providing interactive visualisation of topic-entity sentiments and the corresponding summaries in chronological order. Throughout our work presented in this thesis, we conduct experiments to evaluate and verify the effectiveness of our proposed models, comparing to relevant baseline methods. Most of our evaluations are quantitative, however, we do perform qualitative analyses where it is appropriate. This thesis provides insights and findings that can be used for better understanding public opinion in social media

    A systematic literature review on spam content detection and classification

    Get PDF
    The presence of spam content in social media is tremendously increasing, and therefore the detection of spam has become vital. The spam contents increase as people extensively use social media, i.e ., Facebook, Twitter, YouTube, and E-mail. The time spent by people using social media is overgrowing, especially in the time of the pandemic. Users get a lot of text messages through social media, and they cannot recognize the spam content in these messages. Spam messages contain malicious links, apps, fake accounts, fake news, reviews, rumors, etc. To improve social media security, the detection and control of spam text are essential. This paper presents a detailed survey on the latest developments in spam text detection and classification in social media. The various techniques involved in spam detection and classification involving Machine Learning, Deep Learning, and text-based approaches are discussed in this paper. We also present the challenges encountered in the identification of spam with its control mechanisms and datasets used in existing works involving spam detection

    Statistical Features-Based Real-Time Detection of Drifted Twitter Spam

    Get PDF
    AcceptedThis is the author accepted manuscript. The final version is available from the publisher via the DOI in this record.Twitter spam has become a critical problem nowadays. Recent works focus on applying machine learning techniques for Twitter spam detection, which make use of the statistical features of tweets. In our labeled tweets data set, however, we observe that the statistical properties of spam tweets vary over time, and thus, the performance of existing machine learning-based classifiers decreases. This issue is referred to as “Twitter Spam Drift”. In order to tackle this problem, we first carry out a deep analysis on the statistical features of one million spam tweets and one million non-spam tweets, and then propose a novel Lfun scheme. The proposed scheme can discover “changed” spam tweets from unlabeled tweets and incorporate them into classifier’s training process. A number of experiments are performed to evaluate the proposed scheme. The results show that our proposed Lfun scheme can significantly improve the spam detection accuracy in real-world scenarios.This work was supported by the ARC Linkage Project under Grant LP120200266. The work of J. Zhang was supported by the National Natural Science Foundation of China under Grant 61401371

    SpADe: Multi-Stage Spam Account Detection for Online Social Networks

    Get PDF
    In recent years, Online Social Networks (OSNs) have radically changed the way people communicate. The most widely used platforms, such as Facebook, Youtube, and Instagram, claim more than one billion monthly active users each. Beyond these, news-oriented micro-blogging services, e.g., Twitter, are daily accessed by more than 120 million users sharing contents from all over the world. Unfortunately, legitimate users of the OSNs are mixed with malicious ones, which are interested in spreading unwanted, misleading, harmful, or discriminatory content. Spam detection in OSNs is generally approached by considering the characteristics of the account under analysis, its connection with the rest of the network, as well as data and metadata representing the content shared. However, obtaining all this information can be computationally expensive, or even unfeasible, on massive networks. Driven by these motivations, in this paper we propose SpADe, a multi-stage Spam Account Detection algorithm with reject option, whose purpose is to exploit less costly features at the early stages, while progressively extracting more complex information only for those accounts that are difficult to classify. Experimental evaluation shows the effectiveness of the proposed algorithm compared to single-stage approaches, which are much more complex in terms of features processing and classification time

    Enhancing data privacy and security related process through machine learning

    Get PDF
    In this thesis, we exploit the advantages of Machine learning (ML) in the domains of data security and data privacy. ML is one of the most exciting technologies being developed in the world today. The major advantages of ML technology are its prediction capability and its ability to reduce the need for human activities to perform tasks. These benefits motivated us to exploit ML to improve users' data privacy and security. Firstly, we use ML technology to try to predict the best privacy settings for users, since ML has a strong prediction ability and the average user might find it difficult to properly set up privacy settings due to a lack of knowledge and subsequent lack of decision-making abilities regarding the privacy of their data. Besides, since the ML approach has the potential to considerably cut down on manual efforts by humans, our second task in this thesis is to exploit ML technology to redesign security mechanisms of social media environments that rely on human participation for providing such services. In particular, we use ML to train spam filters for identifying and removing violent, insulting, aggressive, and harassing content creators (a.k.a. spammers) from a social media platform. It helps to solve violent and aggressive issues that have been growing on social media environments. The experimental results show that our proposals are efficient and effective
    • …
    corecore