2,353 research outputs found
Bengali Fake Review Detection using Semi-supervised Generative Adversarial Networks
This paper investigates the potential of semi-supervised Generative
Adversarial Networks (GANs) to fine-tune pretrained language models in order to
classify Bengali fake reviews from real reviews with a few annotated data. With
the rise of social media and e-commerce, the ability to detect fake or
deceptive reviews is becoming increasingly important in order to protect
consumers from being misled by false information. Any machine learning model
will have trouble identifying a fake review, especially for a low resource
language like Bengali. We have demonstrated that the proposed semi-supervised
GAN-LM architecture (generative adversarial network on top of a pretrained
language model) is a viable solution in classifying Bengali fake reviews as the
experimental results suggest that even with only 1024 annotated samples,
BanglaBERT with semi-supervised GAN (SSGAN) achieved an accuracy of 83.59% and
a f1-score of 84.89% outperforming other pretrained language models -
BanglaBERT generator, Bangla BERT Base and Bangla-Electra by almost 3%, 4% and
10% respectively in terms of accuracy. The experiments were conducted on a
manually labeled food review dataset consisting of total 6014 real and fake
reviews collected from various social media groups. Researchers that are
experiencing difficulty recognizing not just fake reviews but other
classification issues owing to a lack of labeled data may find a solution in
our proposed methodology
A Framework to Categorize Shill and Normal Reviews by Measuring it’s Linguistic Features
Shill reviews detection has attracted significant attention from both business and research communities. Shill reviews are increasingly used to influence the reputation of products sold on websites in positive or negative manner. The spammers may create shill reviews which mislead readers to artificially promote or devalue some target products or services. Different methods which work according to linguistic features have been adopted and implemented effectively. Surprisingly, review manipulation was found on reputable e-commerce websites also. This is the reason why linguistic-feature based methods have gained more and more popularity. Lingual features of shill reviews are examined in this study and then a tool has been developed for extracting product features from the text used in the product review under analysis. Fake reviews, fake comments, fake blogs, fake social network postings and deceptive texts are some forms of shill reviews. By extracting linguistic features like informativeness, subjectivity and readability, an attempt is made to find difference between shill and normal reviews. On the basis of these three characteristics, hypotheses are formed and generalized. These hypotheses help to compare shill and normal reviews in analytical terms. Proposed work is for based on polarity of the text (positive or negative), as shill reviewer tend to use a definite polarity based on their intention, positive or negative
Spam Reviews Detection in the Time of COVID-19 Pandemic: Background, Definitions, Methods and Literature Analysis
This work has been partially funded by projects PID2020-113462RB-I00 (ANIMALICOS), granted by Ministerio Espanol de Economia y Competitividad; projects P18-RT-4830 and A-TIC-608-UGR20 granted by Junta de Andalucia, and project B-TIC-402-UGR18 (FEDER and Junta de Andalucia).During the recent COVID-19 pandemic, people were forced to stay at home to protect
their own and others’ lives. As a result, remote technology is being considered more in all aspects
of life. One important example of this is online reviews, where the number of reviews increased
promptly in the last two years according to Statista and Rize reports. People started to depend more
on these reviews as a result of the mandatory physical distance employed in all countries. With no
one speaking to about products and services feedback. Reading and posting online reviews becomes
an important part of discussion and decision-making, especially for individuals and organizations.
However, the growth of online reviews usage also provoked an increase in spam reviews. Spam
reviews can be identified as fraud, malicious and fake reviews written for the purpose of profit
or publicity. A number of spam detection methods have been proposed to solve this problem. As
part of this study, we outline the concepts and detection methods of spam reviews, along with
their implications in the environment of online reviews. The study addresses all the spam reviews
detection studies for the years 2020 and 2021. In other words, we analyze and examine all works
presented during the COVID-19 situation. Then, highlight the differences between the works before
and after the pandemic in terms of reviews behavior and research findings. Furthermore, nine
different detection approaches have been classified in order to investigate their specific advantages,
limitations, and ways to improve their performance. Additionally, a literature analysis, discussion,
and future directions were also presented.Spanish Government PID2020-113462RB-I00Junta de Andalucia P18-RT-4830
A-TIC-608-UGR20
B-TIC-402-UGR18European Commission B-TIC-402-UGR1
"Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection
Automatic fake news detection is a challenging problem in deception
detection, and it has tremendous real-world political and social impacts.
However, statistical approaches to combating fake news has been dramatically
limited by the lack of labeled benchmark datasets. In this paper, we present
liar: a new, publicly available dataset for fake news detection. We collected a
decade-long, 12.8K manually labeled short statements in various contexts from
PolitiFact.com, which provides detailed analysis report and links to source
documents for each case. This dataset can be used for fact-checking research as
well. Notably, this new dataset is an order of magnitude larger than previously
largest public fake news datasets of similar type. Empirically, we investigate
automatic fake news detection based on surface-level linguistic patterns. We
have designed a novel, hybrid convolutional neural network to integrate
meta-data with text. We show that this hybrid approach can improve a text-only
deep learning model.Comment: ACL 201
Supervised Machine Learning Models for Fake News Detection
Fake news or the distribution of disinformation has become one of the most challenging issues in society. News and information are churned out across online websites and platforms in real-time, with little or no way for the viewing public to determine what is real or manufactured. But an awareness of what we are consuming online is becoming apparent and efforts are underway to explore how we separate fake content from genuine and truthful information.
The most challenging part of fake news is determining how to spot it. In technology, there are ways to help us do this. Supervised machine learning helps us to identify in a labelled dataset if a piece of information is fake or not. However, machine learning can be a black-box tool - a device, system or object which can be viewed in terms of its inputs and outputs – that focuses on one aspect of the problem and in doing so, isn’t addressing the bigger picture. To solve this issue, it is very important to understand how it works. The process of data pre-processing and the dataset labelling is part of this understanding. It is also worth knowing the algorithms mechanisms in order to choose the best one for the proposed project.
Evaluating machine learning algorithms model is one way to get better results. Changing paths within algorithms is not a bad thing if it is addressing the limitations within. With this project, we have done just this, changing from Sports news detection using Twitter API to labelled datasets and as a result we have an original Gofaas dataset, Gofaas library R package and Gofaas WebApp. Machine Learning is a demanding subject but fascinating at the same time. We hope this modest project helps people to face these challenges and learn from our findings accordingly
Supervised Machine Learning Models for Fake News Detection
Fake news or the distribution of disinformation has become one of the most challenging issues in society. News and information are churned out across online websites and platforms in real-time, with little or no way for the viewing public to determine what is real or manufactured. But an awareness of what we are consuming online is becoming apparent and efforts are underway to explore how we separate fake content from genuine and truthful information. The most challenging part of fake news is determining how to spot it. In technology, there are ways to help us do this. Supervised machine learning helps us to identify in a labelled dataset if a piece of information is fake or not. However, machine learning can be a black-box tool - a device, system or object which can be viewed in terms of its inputs and outputs – that focuses on one aspect of the problem and in doing so, isn’t addressing the bigger picture. To solve this issue, it is very important to understand how it works. The process of data pre-processing and the dataset labelling is part of this understanding. It is also worth knowing the algorithms mechanisms in order to choose the best one for the proposed project. Evaluating machine learning algorithms model is one way to get better results. Changing paths within algorithms is not a bad thing if it is addressing the limitations within. With this project, we have done just this, changing from Sports news detection using Twitter API to labelled datasets and as a result we have an original Gofaas dataset, Gofaas library R package and Gofaas WebApp. Machine Learning is a demanding subject but fascinating at the same time. We hope this modest project helps people to face these challenges and learn from our findings accordingly
- …