2,353 research outputs found

    Bengali Fake Review Detection using Semi-supervised Generative Adversarial Networks

    Full text link
    This paper investigates the potential of semi-supervised Generative Adversarial Networks (GANs) to fine-tune pretrained language models in order to classify Bengali fake reviews from real reviews with a few annotated data. With the rise of social media and e-commerce, the ability to detect fake or deceptive reviews is becoming increasingly important in order to protect consumers from being misled by false information. Any machine learning model will have trouble identifying a fake review, especially for a low resource language like Bengali. We have demonstrated that the proposed semi-supervised GAN-LM architecture (generative adversarial network on top of a pretrained language model) is a viable solution in classifying Bengali fake reviews as the experimental results suggest that even with only 1024 annotated samples, BanglaBERT with semi-supervised GAN (SSGAN) achieved an accuracy of 83.59% and a f1-score of 84.89% outperforming other pretrained language models - BanglaBERT generator, Bangla BERT Base and Bangla-Electra by almost 3%, 4% and 10% respectively in terms of accuracy. The experiments were conducted on a manually labeled food review dataset consisting of total 6014 real and fake reviews collected from various social media groups. Researchers that are experiencing difficulty recognizing not just fake reviews but other classification issues owing to a lack of labeled data may find a solution in our proposed methodology

    A Framework to Categorize Shill and Normal Reviews by Measuring it’s Linguistic Features

    Get PDF
    Shill reviews detection has attracted significant attention from both business and research communities. Shill reviews are increasingly used to influence the reputation of products sold on websites in positive or negative manner. The spammers may create shill reviews which mislead readers to artificially promote or devalue some target products or services. Different methods which work according to linguistic features have been adopted and implemented effectively. Surprisingly, review manipulation was found on reputable e-commerce websites also. This is the reason why linguistic-feature based methods have gained more and more popularity. Lingual features of shill reviews are examined in this study and then a tool has been developed for extracting product features from the text used in the product review under analysis. Fake reviews, fake comments, fake blogs, fake social network postings and deceptive texts are some forms of shill reviews. By extracting linguistic features like informativeness, subjectivity and readability, an attempt is made to find difference between shill and normal reviews. On the basis of these three characteristics, hypotheses are formed and generalized. These hypotheses help to compare shill and normal reviews in analytical terms. Proposed work is for based on polarity of the text (positive or negative), as shill reviewer tend to use a definite polarity based on their intention, positive or negative

    Spam Reviews Detection in the Time of COVID-19 Pandemic: Background, Definitions, Methods and Literature Analysis

    Get PDF
    This work has been partially funded by projects PID2020-113462RB-I00 (ANIMALICOS), granted by Ministerio Espanol de Economia y Competitividad; projects P18-RT-4830 and A-TIC-608-UGR20 granted by Junta de Andalucia, and project B-TIC-402-UGR18 (FEDER and Junta de Andalucia).During the recent COVID-19 pandemic, people were forced to stay at home to protect their own and others’ lives. As a result, remote technology is being considered more in all aspects of life. One important example of this is online reviews, where the number of reviews increased promptly in the last two years according to Statista and Rize reports. People started to depend more on these reviews as a result of the mandatory physical distance employed in all countries. With no one speaking to about products and services feedback. Reading and posting online reviews becomes an important part of discussion and decision-making, especially for individuals and organizations. However, the growth of online reviews usage also provoked an increase in spam reviews. Spam reviews can be identified as fraud, malicious and fake reviews written for the purpose of profit or publicity. A number of spam detection methods have been proposed to solve this problem. As part of this study, we outline the concepts and detection methods of spam reviews, along with their implications in the environment of online reviews. The study addresses all the spam reviews detection studies for the years 2020 and 2021. In other words, we analyze and examine all works presented during the COVID-19 situation. Then, highlight the differences between the works before and after the pandemic in terms of reviews behavior and research findings. Furthermore, nine different detection approaches have been classified in order to investigate their specific advantages, limitations, and ways to improve their performance. Additionally, a literature analysis, discussion, and future directions were also presented.Spanish Government PID2020-113462RB-I00Junta de Andalucia P18-RT-4830 A-TIC-608-UGR20 B-TIC-402-UGR18European Commission B-TIC-402-UGR1

    "Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection

    Full text link
    Automatic fake news detection is a challenging problem in deception detection, and it has tremendous real-world political and social impacts. However, statistical approaches to combating fake news has been dramatically limited by the lack of labeled benchmark datasets. In this paper, we present liar: a new, publicly available dataset for fake news detection. We collected a decade-long, 12.8K manually labeled short statements in various contexts from PolitiFact.com, which provides detailed analysis report and links to source documents for each case. This dataset can be used for fact-checking research as well. Notably, this new dataset is an order of magnitude larger than previously largest public fake news datasets of similar type. Empirically, we investigate automatic fake news detection based on surface-level linguistic patterns. We have designed a novel, hybrid convolutional neural network to integrate meta-data with text. We show that this hybrid approach can improve a text-only deep learning model.Comment: ACL 201

    Supervised Machine Learning Models for Fake News Detection

    Get PDF
    Fake news or the distribution of disinformation has become one of the most challenging issues in society. News and information are churned out across online websites and platforms in real-time, with little or no way for the viewing public to determine what is real or manufactured. But an awareness of what we are consuming online is becoming apparent and efforts are underway to explore how we separate fake content from genuine and truthful information. The most challenging part of fake news is determining how to spot it. In technology, there are ways to help us do this. Supervised machine learning helps us to identify in a labelled dataset if a piece of information is fake or not. However, machine learning can be a black-box tool - a device, system or object which can be viewed in terms of its inputs and outputs – that focuses on one aspect of the problem and in doing so, isn’t addressing the bigger picture. To solve this issue, it is very important to understand how it works. The process of data pre-processing and the dataset labelling is part of this understanding. It is also worth knowing the algorithms mechanisms in order to choose the best one for the proposed project. Evaluating machine learning algorithms model is one way to get better results. Changing paths within algorithms is not a bad thing if it is addressing the limitations within. With this project, we have done just this, changing from Sports news detection using Twitter API to labelled datasets and as a result we have an original Gofaas dataset, Gofaas library R package and Gofaas WebApp. Machine Learning is a demanding subject but fascinating at the same time. We hope this modest project helps people to face these challenges and learn from our findings accordingly

    Supervised Machine Learning Models for Fake News Detection

    Get PDF
    Fake news or the distribution of disinformation has become one of the most challenging issues in society. News and information are churned out across online websites and platforms in real-time, with little or no way for the viewing public to determine what is real or manufactured. But an awareness of what we are consuming online is becoming apparent and efforts are underway to explore how we separate fake content from genuine and truthful information. The most challenging part of fake news is determining how to spot it. In technology, there are ways to help us do this. Supervised machine learning helps us to identify in a labelled dataset if a piece of information is fake or not. However, machine learning can be a black-box tool - a device, system or object which can be viewed in terms of its inputs and outputs – that focuses on one aspect of the problem and in doing so, isn’t addressing the bigger picture. To solve this issue, it is very important to understand how it works. The process of data pre-processing and the dataset labelling is part of this understanding. It is also worth knowing the algorithms mechanisms in order to choose the best one for the proposed project. Evaluating machine learning algorithms model is one way to get better results. Changing paths within algorithms is not a bad thing if it is addressing the limitations within. With this project, we have done just this, changing from Sports news detection using Twitter API to labelled datasets and as a result we have an original Gofaas dataset, Gofaas library R package and Gofaas WebApp. Machine Learning is a demanding subject but fascinating at the same time. We hope this modest project helps people to face these challenges and learn from our findings accordingly
    • …
    corecore