77 research outputs found

    Prediction and Analysis of Rumour's Impact on Social Media

    A Systematic Review on the Detection of Fake News Articles

    Currently submitted to ACM Transactions on Intelligent Systems and Technology. Awaiting peer-review.It has been argued that fake news and the spread of false information pose a threat to societies throughout the world, from influencing the results of elections to hindering the efforts to manage the COVID-19 pandemic. To combat this threat, a number of Natural Language Processing (NLP) approaches have been developed. These leverage a number of datasets, feature extraction/selection techniques and machine learning (ML) algorithms to detect fake news before it spreads. While these methods are well-documented, there is less evidence regarding their efficacy in this domain. By systematically reviewing the literature, this paper aims to delineate the approaches for fake news detection that are most performant, identify limitations with existing approaches, and suggest ways these can be mitigated. The analysis of the results indicates that Ensemble Methods using a combination of news content and socially-based features are currently the most effective. Finally, it is proposed that future research should focus on developing approaches that address generalisability issues (which, in part, arise from limitations with current datasets), explainability and bias

    Context-Aware Message-Level Rumour Detection with Weak Supervision

    Social media has become the main source of all sorts of information beyond a communication medium. Its intrinsic nature can allow a continuous and massive flow of misinformation to make a severe impact worldwide. In particular, rumours emerge unexpectedly and spread quickly. It is challenging to track down their origins and stop their propagation. One of the most ideal solutions to this is to identify rumour-mongering messages as early as possible, which is commonly referred to as "Early Rumour Detection (ERD)". This dissertation focuses on researching ERD on social media by exploiting weak supervision and contextual information. Weak supervision is a branch of ML where noisy and less precise sources (e.g. data patterns) are leveraged to learn limited high-quality labelled data (Ratner et al., 2017). This is intended to reduce the cost and increase the efficiency of the hand-labelling of large-scale data. This thesis aims to study whether identifying rumours before they go viral is possible and develop an architecture for ERD at individual post level. To this end, it first explores major bottlenecks of current ERD. It also uncovers a research gap between system design and its applications in the real world, which have received less attention from the research community of ERD. One bottleneck is limited labelled data. Weakly supervised methods to augment limited labelled training data for ERD are introduced. The other bottleneck is enormous amounts of noisy data. A framework unifying burst detection based on temporal signals and burst summarisation is investigated to identify potential rumours (i.e. input to rumour detection models) by filtering out uninformative messages. Finally, a novel method which jointly learns rumour sources and their contexts (i.e. conversational threads) for ERD is proposed. An extensive evaluation setting for ERD systems is also introduced

    Fake News Detection in Social Media Using Machine Learning and Deep Learning

    Fake news detection in social media is a process of detecting false information that is intentionally created to mislead readers. The spread of fake news may cause social, economic, and political turmoil if their proliferation is not prevented. However, fake news detection using machine learning faces many challenges. Datasets of fake news are usually unstructured and noisy. Fake news often mimics true news. In this study, a data preprocessing method is proposed for mitigating missing values in the datasets to enhance fake news detection accuracy. The experimental results show that Multi- Layer Perceptron (MLP) classifier combined with the proposed data preprocessing method outperforms the state-of-the-art methods. Furthermore, to improve the early detection of rumors in social media, a time-series model is proposed for fake news detection in social media using Twitter data. With the proposed model, computational complexity has been reduced significantly in terms of machine learning models training and testing times while achieving similar results as state-of-the-art in the literature. Besides, the proposed method has a simplified feature extraction process, because only the temporal features of the Twitter data are used. Moreover, deep learning techniques are also applied to fake news detection. Experimental results demonstrate that deep learning methods outperformed traditional machine learning models. Specifically, the ensemble-based deep learning classification model achieved top performance

    Can we predict a riot? Disruptive event detection using Twitter

    In recent years, there has been increased interest in real-world event detection using publicly accessible data made available through Internet technology such as Twitter, Facebook, and YouTube. In these highly interactive systems, the general public are able to post real-time reactions to “real world” events, thereby acting as social sensors of terrestrial activity. Automatically detecting and categorizing events, particularly small-scale incidents, using streamed data is a non-trivial task but would be of high value to public safety organisations such as local police, who need to respond accordingly. To address this challenge, we present an end-to-end integrated event detection framework that comprises five main components: data collection, pre-processing, classification, online clustering, and summarization. The integration between classification and clustering enables events to be detected, as well as related smaller-scale “disruptive events,” smaller incidents that threaten social safety and security or could disrupt social order. We present an evaluation of the effectiveness of detecting events using a variety of features derived from Twitter posts, namely temporal, spatial, and textual content. We evaluate our framework on a large-scale, real-world dataset from Twitter. Furthermore, we apply our event detection system to a large corpus of tweets posted during the August 2011 riots in England. We use ground-truth data based on intelligence gathered by the London Metropolitan Police Service, which provides a record of actual terrestrial events and incidents during the riots, and show that our system can perform as well as terrestrial sources, and even better in some cases

    Making the Most of Tweet-Inherent Features for Social Spam Detection on Twitter

    Social spam produces a great amount of noise on social media services such as Twitter, which reduces the signal-to-noise ratio that both end users and data mining applications observe. Existing techniques on social spam detection have focused primarily on the identification of spam accounts by using extensive historical and network-based data. In this paper we focus on the detection of spam tweets, which optimises the amount of data that needs to be gathered by relying only on tweet-inherent features. This enables the application of the spam detection system to a large set of tweets in a timely fashion, potentially applicable in a real-time or near real-time setting. Using two large hand-labelled datasets of tweets containing spam, we study the suitability of five classification algorithms and four different feature sets to the social spam detection task. Our results show that, by using the limited set of features readily available in a tweet, we can achieve encouraging results which are competitive when compared against existing spammer detection systems that make use of additional, costly user features. Our study is the first that attempts at generalising conclusions on the optimal classifiers and sets of features for social spam detection over different datasets

    A Retrospective Analysis of the Fake News Challenge Stance Detection Task

    The 2017 Fake News Challenge Stage 1 (FNC-1) shared task addressed a stance classification task as a crucial first step towards detecting fake news. To date, there is no in-depth analysis paper to critically discuss FNC-1's experimental setup, reproduce the results, and draw conclusions for next-generation stance classification methods. In this paper, we provide such an in-depth analysis for the three top-performing systems. We first find that FNC-1's proposed evaluation metric favors the majority class, which can be easily classified, and thus overestimates the true discriminative power of the methods. Therefore, we propose a new F1-based metric yielding a changed system ranking. Next, we compare the features and architectures used, which leads to a novel feature-rich stacked LSTM model that performs on par with the best systems, but is superior in predicting minority classes. To understand the methods' ability to generalize, we derive a new dataset and perform both in-domain and cross-domain experiments. Our qualitative and quantitative study helps interpreting the original FNC-1 scores and understand which features help improving performance and why. Our new dataset and all source code used during the reproduction study are publicly available for future research

    Aplikasi Klasifikasi SMS Berbasis Web Menggunakan Algoritma Logistic Regression

    Jenis SMS spam adalah jenis pesan teks yang tidak diinginkan atau tidak diminta yang dikirim ke ponsel pengguna, seringkali untuk tujuan komersial. Untuk mengatasi masalah spam, diperlukan teknik untuk memilah kata atau kalimat termasuk spam atau bukan spam. Pada penelitian ini diusulkan menggunakan machine learning untuk mengklasifikasikan pesan mana yang spam dan mana yang tidak spam. Data yang digunakan pada penelitian ini terdiri dari 1140 pesan, dimana sudah diberi label 0 untuk pesan yang tidak spam dan 1 untuk pesan yang spam. Algoritma yang digunakan untuk kasus ini adalah Logistic Regression. Hasil penelitian menunjukkan model memiliki tingkat akurasi untuk mengklasifikasi pesan, sebesar 97%. Aplikasi yang dikembangkan untuk menerapkan hasil pemodelan machine learning menggunakan bentuk sebuah website sederhana dengan bantuan Flask framework dari Python. Hasil akhir dari aplikasi ini adalah model machine learning yang dapat dibuka melalui website

    Enhancing prediction of user stance for social networks rumors

    The spread of social media has led to a massive change in the way information is dispersed. It provides organizations and individuals wider opportunities of collaboration. But it also causes an emergence of malicious users and attention seekers to spread rumors and fake news. Understanding user stances in rumor posts is very important to identify the veracity of the underlying content as news becomes viral in a few seconds which can lead to mass panic and confusion. In this paper, different machine learning techniques were utilized to enhance the user stance prediction through a conversation thread towards a given rumor on Twitter platform. We utilized both conversation thread features as well as features related to users who participated in this conversation, in order to predict the users’ stances, in terms of supporting, denying, querying, or commenting (SDQC), towards the source tweet. Furthermore, different datasets for the stance-prediction task were explored to handle the data imbalance problem and data augmentation for minority classes was applied to enhance the results. The proposed framework outperforms the state-of-the-art results with macro F1-score of 0.7233