6,319 research outputs found
A Survey of Fake News: Fundamental Theories, Detection Methods, and Opportunities
The explosive growth in fake news and its erosion to democracy, justice, and
public trust has increased the demand for fake news detection and intervention.
This survey reviews and evaluates methods that can detect fake news from four
perspectives: (1) the false knowledge it carries, (2) its writing style, (3)
its propagation patterns, and (4) the credibility of its source. The survey
also highlights some potential research tasks based on the review. In
particular, we identify and detail related fundamental theories across various
disciplines to encourage interdisciplinary research on fake news. We hope this
survey can facilitate collaborative efforts among experts in computer and
information sciences, social sciences, political science, and journalism to
research fake news, where such efforts can lead to fake news detection that is
not only efficient but more importantly, explainable.Comment: ACM Computing Surveys (CSUR), 37 page
A framework for fake review detection in online consumer electronics retailers
The impact of online reviews on businesses has grown significantly during
last years, being crucial to determine business success in a wide array of
sectors, ranging from restaurants, hotels to e-commerce. Unfortunately, some
users use unethical means to improve their online reputation by writing fake
reviews of their businesses or competitors. Previous research has addressed
fake review detection in a number of domains, such as product or business
reviews in restaurants and hotels. However, in spite of its economical
interest, the domain of consumer electronics businesses has not yet been
thoroughly studied. This article proposes a feature framework for detecting
fake reviews that has been evaluated in the consumer electronics domain. The
contributions are fourfold: (i) Construction of a dataset for classifying fake
reviews in the consumer electronics domain in four different cities based on
scraping techniques; (ii) definition of a feature framework for fake review
detection; (iii) development of a fake review classification method based on
the proposed framework and (iv) evaluation and analysis of the results for each
of the cities under study. We have reached an 82% F-Score on the classification
task and the Ada Boost classifier has been proven to be the best one by
statistical means according to the Friedman test.Comment: Information Processing & Management, 11 page
Combating Fake News: A Survey on Identification and Mitigation Techniques
The proliferation of fake news on social media has opened up new directions
of research for timely identification and containment of fake news, and
mitigation of its widespread impact on public opinion. While much of the
earlier research was focused on identification of fake news based on its
contents or by exploiting users' engagements with the news on social media,
there has been a rising interest in proactive intervention strategies to
counter the spread of misinformation and its impact on society. In this survey,
we describe the modern-day problem of fake news and, in particular, highlight
the technical challenges associated with it. We discuss existing methods and
techniques applicable to both identification and mitigation, with a focus on
the significant advances in each method and their advantages and limitations.
In addition, research has often been limited by the quality of existing
datasets and their specific application contexts. To alleviate this problem, we
comprehensively compile and summarize characteristic features of available
datasets. Furthermore, we outline new directions of research to facilitate
future development of effective and interdisciplinary solutions
On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection
Humans are the final decision makers in critical tasks that involve ethical
and legal concerns, ranging from recidivism prediction, to medical diagnosis,
to fighting against fake news. Although machine learning models can sometimes
achieve impressive performance in these tasks, these tasks are not amenable to
full automation. To realize the potential of machine learning for improving
human decisions, it is important to understand how assistance from machine
learning models affects human performance and human agency.
In this paper, we use deception detection as a testbed and investigate how we
can harness explanations and predictions of machine learning models to improve
human performance while retaining human agency. We propose a spectrum between
full human agency and full automation, and develop varying levels of machine
assistance along the spectrum that gradually increase the influence of machine
predictions. We find that without showing predicted labels, explanations alone
slightly improve human performance in the end task. In comparison, human
performance is greatly improved by showing predicted labels (>20% relative
improvement) and can be further improved by explicitly suggesting strong
machine performance. Interestingly, when predicted labels are shown,
explanations of machine predictions induce a similar level of accuracy as an
explicit statement of strong machine performance. Our results demonstrate a
tradeoff between human performance and human agency and show that explanations
of machine predictions can moderate this tradeoff.Comment: 17 pages, 19 figures, in Proceedings of ACM FAT* 2019, dataset & demo
available at https://deception.machineintheloop.co
Voting for Deceptive Opinion Spam Detection
Consumers' purchase decisions are increasingly influenced by user-generated
online reviews. Accordingly, there has been growing concern about the potential
for posting deceptive opinion spam fictitious reviews that have been
deliberately written to sound authentic, to deceive the readers. Existing
approaches mainly focus on developing automatic supervised learning based
methods to help users identify deceptive opinion spams.
This work, we used the LSI and Sprinkled LSI technique to reduce the
dimension for deception detection. We make our contribution to demonstrate what
LSI is capturing in latent semantic space and reveal how deceptive opinions can
be recognized automatically from truthful opinions. Finally, we proposed a
voting scheme which integrates different approaches to further improve the
classification performance.Comment: arXiv admin note: text overlap with arXiv:1204.2804 by other author
Towards Accurate Deceptive Opinion Spam Detection based on Word Order-preserving CNN
Nowadays, deep learning has been widely used. In natural language learning,
the analysis of complex semantics has been achieved because of its high degree
of flexibility. The deceptive opinions detection is an important application
area in deep learning model, and related mechanisms have been given attention
and researched. On-line opinions are quite short, varied types and content. In
order to effectively identify deceptive opinions, we need to comprehensively
study the characteristics of deceptive opinions, and explore novel
characteristics besides the textual semantics and emotional polarity that have
been widely used in text analysis. The detection mechanism based on deep
learning has better self-adaptability and can effectively identify all kinds of
deceptive opinions. In this paper, we optimize the convolution neural network
model by embedding the word order characteristics in its convolution layer and
pooling layer, which makes convolution neural network more suitable for various
text classification and deceptive opinions detection. The TensorFlow-based
experiments demonstrate that the detection mechanism proposed in this paper
achieve more accurate deceptive opinion detection results
Towards Understanding and Detecting Fake Reviews in App Stores
App stores include an increasing amount of user feedback in form of app
ratings and reviews. Research and recently also tool vendors have proposed
analytics and data mining solutions to leverage this feedback to developers and
analysts, e.g., for supporting release decisions. Research also showed that
positive feedback improves apps' downloads and sales figures and thus their
success. As a side effect, a market for fake, incentivized app reviews emerged
with yet unclear consequences for developers, app users, and app store
operators. This paper studies fake reviews, their providers, characteristics,
and how well they can be automatically detected. We conducted disguised
questionnaires with 43 fake review providers and studied their review policies
to understand their strategies and offers. By comparing 60,000 fake reviews
with 62 million reviews from the Apple App Store we found significant
differences, e.g., between the corresponding apps, reviewers, rating
distribution, and frequency. This inspired the development of a simple
classifier to automatically detect fake reviews in app stores. On a labelled
and imbalanced dataset including one-tenth of fake reviews, as reported in
other domains, our classifier achieved a recall of 91% and an AUC/ROC value of
98%. We discuss our findings and their impact on software engineering, app
users, and app store operators
CSI: A Hybrid Deep Model for Fake News Detection
The topic of fake news has drawn attention both from the public and the
academic communities. Such misinformation has the potential of affecting public
opinion, providing an opportunity for malicious parties to manipulate the
outcomes of public events such as elections. Because such high stakes are at
play, automatically detecting fake news is an important, yet challenging
problem that is not yet well understood. Nevertheless, there are three
generally agreed upon characteristics of fake news: the text of an article, the
user response it receives, and the source users promoting it. Existing work has
largely focused on tailoring solutions to one particular characteristic which
has limited their success and generality. In this work, we propose a model that
combines all three characteristics for a more accurate and automated
prediction. Specifically, we incorporate the behavior of both parties, users
and articles, and the group behavior of users who propagate fake news.
Motivated by the three characteristics, we propose a model called CSI which is
composed of three modules: Capture, Score, and Integrate. The first module is
based on the response and text; it uses a Recurrent Neural Network to capture
the temporal pattern of user activity on a given article. The second module
learns the source characteristic based on the behavior of users, and the two
are integrated with the third module to classify an article as fake or not.
Experimental analysis on real-world data demonstrates that CSI achieves higher
accuracy than existing models, and extracts meaningful latent representations
of both users and articles.Comment: In Proceedings of the 26th ACM International Conference on
Information and Knowledge Management (CIKM) 201
Catching Loosely Synchronized Behavior in Face of Camouflage
Fraud has severely detrimental impacts on the business of social networks and
other online applications. A user can become a fake celebrity by purchasing
"zombie followers" on Twitter. A merchant can boost his reputation through fake
reviews on Amazon. This phenomenon also conspicuously exists on Facebook, Yelp
and TripAdvisor, etc. In all the cases, fraudsters try to manipulate the
platform's ranking mechanism by faking interactions between the fake accounts
they control and the target customers.Comment: Submitted to WWW 2019, Oct.201
FairJudge: Trustworthy User Prediction in Rating Platforms
Rating platforms enable large-scale collection of user opinion about items
(products, other users, etc.). However, many untrustworthy users give
fraudulent ratings for excessive monetary gains. In the paper, we present
FairJudge, a system to identify such fraudulent users. We propose three
metrics: (i) the fairness of a user that quantifies how trustworthy the user is
in rating the products, (ii) the reliability of a rating that measures how
reliable the rating is, and (iii) the goodness of a product that measures the
quality of the product. Intuitively, a user is fair if it provides reliable
ratings that are close to the goodness of the product. We formulate a mutually
recursive definition of these metrics, and further address cold start problems
and incorporate behavioral properties of users and products in the formulation.
We propose an iterative algorithm, FairJudge, to predict the values of the
three metrics. We prove that FairJudge is guaranteed to converge in a bounded
number of iterations, with linear time complexity. By conducting five different
experiments on five rating platforms, we show that FairJudge significantly
outperforms nine existing algorithms in predicting fair and unfair users. We
reported the 100 most unfair users in the Flipkart network to their review
fraud investigators, and 80 users were correctly identified (80% accuracy). The
FairJudge algorithm is already being deployed at Flipkart
- …