2,538 research outputs found

    X-CapsNet For Fake News Detection

    Full text link
    News consumption has significantly increased with the growing popularity and use of web-based forums and social media. This sets the stage for misinforming and confusing people. To help reduce the impact of misinformation on users' potential health-related decisions and other intents, it is desired to have machine learning models to detect and combat fake news automatically. This paper proposes a novel transformer-based model using Capsule neural Networks(CapsNet) called X-CapsNet. This model includes a CapsNet with dynamic routing algorithm paralyzed with a size-based classifier for detecting short and long fake news statements. We use two size-based classifiers, a Deep Convolutional Neural Network (DCNN) for detecting long fake news statements and a Multi-Layer Perceptron (MLP) for detecting short news statements. To resolve the problem of representing short news statements, we use indirect features of news created by concatenating the vector of news speaker profiles and a vector of polarity, sentiment, and counting words of news statements. For evaluating the proposed architecture, we use the Covid-19 and the Liar datasets. The results in terms of the F1-score for the Covid-19 dataset and accuracy for the Liar dataset show that models perform better than the state-of-the-art baselines

    Review Paper on Enhancing COVID-19 Fake News Detection With Transformer Model

    Get PDF
    The growing propagation of disinformation about the COVID-19 epidemic needs powerful fake news detection technologies. This review provides an in-depth examination of existing techniques, including traditional machine learning methods such as Random Forest and Naive Bayes, as well as sophisticated models for deep learning such as Bi- GRU, CNN, and LSTM, RNN, & transformer-based architecture such as BERT and XLM- Roberta, are also available. One noticeable development is the merging of traditional algorithmswith sophisticated transformers, which emphasize the quest of improved accuracy and flexibility.However, important research gaps have been identified. There has been little research on cross- lingual detection algorithms, revealing a substantial gap in multilingual false news detection, which is critical in the global context of COVID-19 information spread. Furthermore, the researchemphasizes the need of flexible methodologies by emphasizing the need for appropriate preprocessing strategies for various content types. Furthermore, the lack of common assessment measures is a barrier, underlining the need of unified frameworks for successfully benchmarking and comparing models. This analysis provides light on the changing COVID-19 false news detection environment, emphasizing the need for novel, adaptive, and internationally relevant approaches to successfully address the ubiquitous dissemination of disinformation during the current pandemic

    FakeStack: Hierarchical Tri-BERT-CNN-LSTM stacked model for effective fake news detection.

    Get PDF
    False news articles pose a serious challenge in today\u27s information landscape, impacting public opinion and decision-making. Efforts to counter this issue have led to research in deep learning and machine learning methods. However, a gap exists in effectively using contextual cues and skip connections within models, limiting the development of comprehensive detection systems that harness contextual information and vital data propagation. Thus, we propose a model of deep learning, FakeStack, in order to identify bogus news accurately. The model combines the power of pre-trained Bidirectional Encoder Representation of Transformers (BERT) embeddings with a deep Convolutional Neural Network (CNN) having skip convolution block and Long Short-Term Memory (LSTM). The model has been trained and tested on English fake news dataset, and various performance metrics were employed to assess its effectiveness. The results showcase the exceptional performance of FakeStack, achieving an accuracy of 99.74%, precision of 99.67%, recall of 99.80%, and F1-score of 99.74%. Our model\u27s performance was extended to two additional datasets. For the LIAR dataset, our accuracy reached 75.58%, while the WELFake dataset showcased an impressive accuracy of 98.25%. Comparative analysis with other baseline models, including CNN, BERT-CNN, and BERT-LSTM, further highlights the superiority of FakeStack, surpassing all models evaluated. This study underscores the potential of advanced techniques in combating the spread of false news and ensuring the dissemination of reliable information

    Fake News Detection

    Get PDF

    Fake news detection and analysis

    Get PDF
    The evolution of technology has led to the development of environments that allow instantaneous communication and dissemination of information. As a result, false news, article manipulation, lack of trust in media and information bubbles have become high-impact issues. In this context, the need for automatic tools that can classify the content as reliable or not and that can create a trustworthy environment is continually increasing. Current solutions do not entirely solve this problem as the degree of difficulty of the task is high and dependent on factors such as type of language, type of news or subject volatility. The main objective of this thesis is the exploration of this crucial problem of Natural Language Processing, namely false content detection and of how it can be solved as a classification problem with automatic learning. A linguistic approach is taken, experimenting with different types of features and models to build accurate fake news detectors. The experiments are structured in the following three main steps: text pre-processing, feature extraction and classification itself. In addition, they are conducted on a real-world dataset, LIAR, to offer a good overview of which model best overcomes day-to-day situations. Two approaches are chosen: multi-class and binary classification. In both cases, we prove that out of all the experiments, a simple feed-forward network combined with fine-tuned DistilBERT embeddings reports the highest accuracy - 27.30% on 6-labels classification and 63.61% on 2-labels classification. These results emphasize that transfer learning bring important improvements in this task. In addition, we demonstrate that classic machine learning algorithms like Decision Tree, NaĂŻve Bayes, and Support Vector Machine act similar with the state-of-the-art solutions, even performing better than some recurrent neural networks like LSTM or BiLSTM. This clearly confirms that more complex solutions do not guarantee higher performance. Regarding features, we confirm that there is a connection between the degree of veracity of a text and the frequency of terms, more powerful than their position or order. Yet, context prove to be the most powerful aspect in the characteristic extraction process. Also, indices that describe the author's style must be carefully selected to provide relevant information

    Machine Learning-based Automatic Annotation and Detection of COVID-19 Fake News

    Full text link
    COVID-19 impacted every part of the world, although the misinformation about the outbreak traveled faster than the virus. Misinformation spread through online social networks (OSN) often misled people from following correct medical practices. In particular, OSN bots have been a primary source of disseminating false information and initiating cyber propaganda. Existing work neglects the presence of bots that act as a catalyst in the spread and focuses on fake news detection in 'articles shared in posts' rather than the post (textual) content. Most work on misinformation detection uses manually labeled datasets that are hard to scale for building their predictive models. In this research, we overcome this challenge of data scarcity by proposing an automated approach for labeling data using verified fact-checked statements on a Twitter dataset. In addition, we combine textual features with user-level features (such as followers count and friends count) and tweet-level features (such as number of mentions, hashtags and urls in a tweet) to act as additional indicators to detect misinformation. Moreover, we analyzed the presence of bots in tweets and show that bots change their behavior over time and are most active during the misinformation campaign. We collected 10.22 Million COVID-19 related tweets and used our annotation model to build an extensive and original ground truth dataset for classification purposes. We utilize various machine learning models to accurately detect misinformation and our best classification model achieves precision (82%), recall (96%), and false positive rate (3.58%). Also, our bot analysis indicates that bots generated approximately 10% of misinformation tweets. Our methodology results in substantial exposure of false information, thus improving the trustworthiness of information disseminated through social media platforms

    Evaluation of Fake News Detection with Knowledge-Enhanced Language Models

    Full text link
    Recent advances in fake news detection have exploited the success of large-scale pre-trained language models (PLMs). The predominant state-of-the-art approaches are based on fine-tuning PLMs on labelled fake news datasets. However, large-scale PLMs are generally not trained on structured factual data and hence may not possess priors that are grounded in factually accurate knowledge. The use of existing knowledge bases (KBs) with rich human-curated factual information has thus the potential to make fake news detection more effective and robust. In this paper, we investigate the impact of knowledge integration into PLMs for fake news detection. We study several state-of-the-art approaches for knowledge integration, mostly using Wikidata as KB, on two popular fake news datasets - LIAR, a politics-based dataset, and COVID-19, a dataset of messages posted on social media relating to the COVID-19 pandemic. Our experiments show that knowledge-enhanced models can significantly improve fake news detection on LIAR where the KB is relevant and up-to-date. The mixed results on COVID-19 highlight the reliance on stylistic features and the importance of domain specific and current KBs.Comment: To appear in Proceedings of the 16th International AAAI Conference on Web and Social Media (AAAI ICWSM-2022

    DS-Fake : a data stream mining approach for fake news detection

    Full text link
    L’avènement d’internet suivi des réseaux sociaux a permis un accès facile et une diffusion rapide de l’information par toute personne disposant d’une connexion internet. L’une des conséquences néfastes de cela est la propagation de fausses informations appelées «fake news». Les fake news représentent aujourd’hui un enjeu majeur au regard de ces conséquences. De nombreuses personnes affirment encore aujourd’hui que sans la diffusion massive de fake news sur Hillary Clinton lors de la campagne présidentielle de 2016, Donald Trump n’aurait peut-être pas été le vainqueur de cette élection. Le sujet de ce mémoire concerne donc la détection automatique des fake news. De nos jours, il existe un grand nombre de travaux à ce sujet. La majorité des approches présentées se basent soit sur l’exploitation du contenu du texte d’entrée, soit sur le contexte social du texte ou encore sur un mélange entre ces deux types d’approches. Néanmoins, il existe très peu d’outils ou de systèmes efficaces qui détecte une fausse information dans la vie réelle, tout en incluant l’évolution de l’information au cours du temps. De plus, il y a un manque criant de systèmes conçues dans le but d’aider les utilisateurs des réseaux sociaux à adopter un comportement qui leur permettrait de détecter les fausses nouvelles. Afin d’atténuer ce problème, nous proposons un système appelé DS-Fake. À notre connaissance, ce système est le premier à inclure l’exploration de flux de données. Un flux de données est une séquence infinie et dénombrable d’éléments et est utilisée pour représenter des données rendues disponibles au fil du temps. DS-Fake explore à la fois l’entrée et le contenu d’un flux de données. L’entrée est une publication sur Twitter donnée au système afin qu’il puisse déterminer si le tweet est digne de confiance. Le flux de données est extrait à l’aide de techniques d’extraction du contenu de sites Web. Le contenu reçu par ce flux est lié à l’entrée en termes de sujets ou d’entités nommées mentionnées dans le texte d’entrée. DS-Fake aide également les utilisateurs à développer de bons réflexes face à toute information qui se propage sur les réseaux sociaux. DS-Fake attribue un score de crédibilité aux utilisateurs des réseaux sociaux. Ce score décrit la probabilité qu’un utilisateur puisse publier de fausses informations. La plupart des systèmes utilisent des caractéristiques comme le nombre de followers, la localisation, l’emploi, etc. Seuls quelques systèmes utilisent l’historique des publications précédentes d’un utilisateur afin d’attribuer un score. Pour déterminer ce score, la majorité des systèmes utilisent la moyenne. DS-Fake renvoie un pourcentage de confiance qui détermine la probabilité que l’entrée soit fiable. Contrairement au petit nombre de systèmes qui utilisent l’historique des publications en ne prenant pas en compte que les tweets précédents d’un utilisateur, DS-Fake calcule le score de crédibilité sur la base des tweets précédents de tous les utilisateurs. Nous avons renommé le score de crédibilité par score de légitimité. Ce dernier est basé sur la technique de la moyenne Bayésienne. Cette façon de calculer le score permet d’atténuer l’impact des résultats des publications précédentes en fonction du nombre de publications dans l’historique. Un utilisateur donné ayant un plus grand nombre de tweets dans son historique qu’un autre utilisateur, même si les tweets des deux sont tous vrais, le premier utilisateur est plus crédible que le second. Son score de légitimité sera donc plus élevé. À notre connaissance, ce travail est le premier qui utilise la moyenne Bayésienne basée sur l’historique de tweets de toutes les sources pour attribuer un score à chaque source. De plus, les modules de DS-Fake ont la capacité d’encapsuler le résultat de deux tâches, à savoir la similarité de texte et l’inférence en langage naturel hl(en anglais Natural Language Inference). Ce type de modèle qui combine ces deux tâches de TAL est également nouveau pour la problématique de la détection des fake news. DS-Fake surpasse en termes de performance toutes les approches de l’état de l’art qui ont utilisé FakeNewsNet et qui se sont basées sur diverses métriques. Il y a très peu d’ensembles de données complets avec une variété d’attributs, ce qui constitue un des défis de la recherche sur les fausses nouvelles. Shu et al. ont introduit en 2018 l’ensemble de données FakeNewsNet pour résoudre ce problème. Le score de légitimité et les tweets récupérés ajoutent des attributs à l’ensemble de données FakeNewsNet.The advent of the internet, followed by online social networks, has allowed easy access and rapid propagation of information by anyone with an internet connection. One of the harmful consequences of this is the spread of false information, which is well-known by the term "fake news". Fake news represent a major challenge due to their consequences. Some people still affirm that without the massive spread of fake news about Hillary Clinton during the 2016 presidential campaign, Donald Trump would not have been the winner of the 2016 United States presidential election. The subject of this thesis concerns the automatic detection of fake news. Nowadays, there is a lot of research on this subject. The vast majority of the approaches presented in these works are based either on the exploitation of the input text content or the social context of the text or even on a mixture of these two types of approaches. Nevertheless, there are only a few practical tools or systems that detect false information in real life, and that includes the evolution of information over time. Moreover, no system yet offers an explanation to help social network users adopt a behaviour that will allow them to detect fake news. In order to mitigate this problem, we propose a system called DS-Fake. To the best of our knowledge, this system is the first to include data stream mining. A data stream is a sequence of elements used to represent data elements over time. This system explores both the input and the contents of a data stream. The input is a post on Twitter given to the system that determines if the tweet can be trusted. The data stream is extracted using web scraping techniques. The content received by this flow is related to the input in terms of topics or named entities mentioned in the input text. This system also helps users develop good reflexes when faced with any information that spreads on social networks. DS-Fake assigns a credibility score to users of social networks. This score describes how likely a user can publish false information. Most of the systems use features like the number of followers, the localization, the job title, etc. Only a few systems use the history of a user’s previous publications to assign a score. To determine this score, most systems use the average. DS-Fake returns a percentage of confidence that determines how likely the input is reliable. Unlike the small number of systems that use the publication history by taking into account only the previous tweets of a user, DS-Fake calculates the credibility score based on the previous tweets of all users. We renamed the credibility score legitimacy score. The latter is based on the Bayesian averaging technique. This way of calculating the score allows attenuating the impact of the results from previous posts according to the number of posts in the history. A user who has more tweets in his history than another user, even if the tweets of both are all true, the first user is more credible than the second. His legitimacy score will therefore be higher. To our knowledge, this work is the first that uses the Bayesian average based on the post history of all sources to assign a score to each source. DS-Fake modules have the ability to encapsulate the output of two tasks, namely text similarity and natural language inference. This type of model that combines these two NLP tasks is also new for the problem of fake news detection. There are very few complete datasets with a variety of attributes, which is one of the challenges of fake news research. Shu et al. introduce in 2018 the FakeNewsNet dataset to tackle this issue. Our work uses and enriches this dataset. The legitimacy score and the retrieved tweets from named entities mentioned in the input texts add features to the FakeNewsNet dataset. DS-Fake outperforms all state-of-the-art approaches that have used FakeNewsNet and that are based on various metrics
    • …
    corecore