4,951 research outputs found
Detection of Review Abuse via Semi-Supervised Binary Multi-Target Tensor Decomposition
Product reviews and ratings on e-commerce websites provide customers with
detailed insights about various aspects of the product such as quality,
usefulness, etc. Since they influence customers' buying decisions, product
reviews have become a fertile ground for abuse by sellers (colluding with
reviewers) to promote their own products or to tarnish the reputation of
competitor's products. In this paper, our focus is on detecting such abusive
entities (both sellers and reviewers) by applying tensor decomposition on the
product reviews data. While tensor decomposition is mostly unsupervised, we
formulate our problem as a semi-supervised binary multi-target tensor
decomposition, to take advantage of currently known abusive entities. We
empirically show that our multi-target semi-supervised model achieves higher
precision and recall in detecting abusive entities as compared to unsupervised
techniques. Finally, we show that our proposed stochastic partial natural
gradient inference for our model empirically achieves faster convergence than
stochastic gradient and Online-EM with sufficient statistics.Comment: Accepted to the 25th ACM SIGKDD Conference on Knowledge Discovery and
Data Mining, 2019. Contains supplementary material. arXiv admin note: text
overlap with arXiv:1804.0383
Chiron: A Robust Recommendation System with Graph Regularizer
Recommendation systems have been widely used by commercial service providers
for giving suggestions to users. Collaborative filtering (CF) systems, one of
the most popular recommendation systems, utilize the history of behaviors of
the aggregate user-base to provide individual recommendations and are effective
when almost all users faithfully express their opinions. However, they are
vulnerable to malicious users biasing their inputs in order to change the
overall ratings of a specific group of items. CF systems largely fall into two
categories - neighborhood-based and (matrix) factorization-based - and the
presence of adversarial input can influence recommendations in both categories,
leading to instabilities in estimation and prediction. Although the robustness
of different collaborative filtering algorithms has been extensively studied,
designing an efficient system that is immune to manipulation remains a
significant challenge. In this work we propose a novel "hybrid" recommendation
system with an adaptive graph-based user/item similarity-regularization -
"Chiron". Chiron ties the performance benefits of dimensionality reduction
(through factorization) with the advantage of neighborhood clustering (through
regularization). We demonstrate, using extensive comparative experiments, that
Chiron is resistant to manipulation by large and lethal attacks
Identifying leading indicators of product recalls from online reviews using positive unlabeled learning and domain adaptation
Consumer protection agencies are charged with safeguarding the public from
hazardous products, but the thousands of products under their jurisdiction make
it challenging to identify and respond to consumer complaints quickly. From the
consumer's perspective, online reviews can provide evidence of product defects,
but manually sifting through hundreds of reviews is not always feasible. In
this paper, we propose a system to mine Amazon.com reviews to identify products
that may pose safety or health hazards. Since labeled data for this task are
scarce, our approach combines positive unlabeled learning with domain
adaptation to train a classifier from consumer complaints submitted to the U.S.
Consumer Product Safety Commission. On a validation set of manually annotated
Amazon product reviews, we find that our approach results in an absolute F1
score improvement of 8% over the best competing baseline. Furthermore, we apply
the classifier to Amazon reviews of known recalled products; the classifier
identifies reviews reporting safety hazards prior to the recall date for 45% of
the products. This suggests that the system may be able to provide an early
warning system to alert consumers to hazardous products before an official
recall is announced
Adversarial Variational Embedding for Robust Semi-supervised Learning
Semi-supervised learning is sought for leveraging the unlabelled data when
labelled data is difficult or expensive to acquire. Deep generative models
(e.g., Variational Autoencoder (VAE)) and semisupervised Generative Adversarial
Networks (GANs) have recently shown promising performance in semi-supervised
classification for the excellent discriminative representing ability. However,
the latent code learned by the traditional VAE is not exclusive (repeatable)
for a specific input sample, which prevents it from excellent classification
performance. In particular, the learned latent representation depends on a
non-exclusive component which is stochastically sampled from the prior
distribution. Moreover, the semi-supervised GAN models generate data from
pre-defined distribution (e.g., Gaussian noises) which is independent of the
input data distribution and may obstruct the convergence and is difficult to
control the distribution of the generated data. To address the aforementioned
issues, we propose a novel Adversarial Variational Embedding (AVAE) framework
for robust and effective semi-supervised learning to leverage both the
advantage of GAN as a high quality generative model and VAE as a posterior
distribution learner. The proposed approach first produces an exclusive latent
code by the model which we call VAE++, and meanwhile, provides a meaningful
prior distribution for the generator of GAN. The proposed approach is evaluated
over four different real-world applications and we show that our method
outperforms the state-of-the-art models, which confirms that the combination of
VAE++ and GAN can provide significant improvements in semisupervised
classification.Comment: 9 pages, Accepted by Research Track in KDD 201
False News On Social Media: A Data-Driven Survey
In the past few years, the research community has dedicated growing interest
to the issue of false news circulating on social networks. The widespread
attention on detecting and characterizing false news has been motivated by
considerable backlashes of this threat against the real world. As a matter of
fact, social media platforms exhibit peculiar characteristics, with respect to
traditional news outlets, which have been particularly favorable to the
proliferation of deceptive information. They also present unique challenges for
all kind of potential interventions on the subject. As this issue becomes of
global concern, it is also gaining more attention in academia. The aim of this
survey is to offer a comprehensive study on the recent advances in terms of
detection, characterization and mitigation of false news that propagate on
social media, as well as the challenges and the open questions that await
future research on the field. We use a data-driven approach, focusing on a
classification of the features that are used in each study to characterize
false information and on the datasets used for instructing classification
methods. At the end of the survey, we highlight emerging approaches that look
most promising for addressing false news
Detecting Sockpuppets in Deceptive Opinion Spam
This paper explores the problem of sockpuppet detection in deceptive opinion
spam using authorship attribution and verification approaches. Two methods are
explored. The first is a feature subsampling scheme that uses the KL-Divergence
on stylistic language models of an author to find discriminative features. The
second is a transduction scheme, spy induction that leverages the diversity of
authors in the unlabeled test set by sending a set of spies (positive samples)
from the training set to retrieve hidden samples in the unlabeled test set
using nearest and farthest neighbors. Experiments using ground truth sockpuppet
data show the effectiveness of the proposed schemes.Comment: 18 pages, Accepted at CICLing 2017, 18th International Conference on
Intelligent Text Processing and Computational Linguistic
- …