210,537 research outputs found
Online Deception Detection Refueled by Real World Data Collection
The lack of large realistic datasets presents a bottleneck in online
deception detection studies. In this paper, we apply a data collection method
based on social network analysis to quickly identify high-quality deceptive and
truthful online reviews from Amazon. The dataset contains more than 10,000
deceptive reviews and is diverse in product domains and reviewers. Using this
dataset, we explore effective general features for online deception detection
that perform well across domains. We demonstrate that with generalized features
- advertising speak and writing complexity scores - deception detection
performance can be further improved by adding additional deceptive reviews from
assorted domains in training. Finally, reviewer level evaluation gives an
interesting insight into different deceptive reviewers' writing styles.Comment: 10 pages, Accepted to Recent Advances in Natural Language Processing
(RANLP) 201
Enforcing public data archiving policies in academic publishing: A study of ecology journals
To improve the quality and efficiency of research, groups within the
scientific community seek to exploit the value of data sharing. Funders,
institutions, and specialist organizations are developing and implementing
strategies to encourage or mandate data sharing within and across disciplines,
with varying degrees of success. Academic journals in ecology and evolution
have adopted several types of public data archiving policies requiring authors
to make data underlying scholarly manuscripts freely available. Yet anecdotes
from the community and studies evaluating data availability suggest that these
policies have not obtained the desired effects, both in terms of quantity and
quality of available datasets. We conducted a qualitative, interview-based
study with journal editorial staff and other stakeholders in the academic
publishing process to examine how journals enforce data archiving policies. We
specifically sought to establish who editors and other stakeholders perceive as
responsible for ensuring data completeness and quality in the peer review
process. Our analysis revealed little consensus with regard to how data
archiving policies should be enforced and who should hold authors accountable
for dataset submissions. Themes in interviewee responses included hopefulness
that reviewers would take the initiative to review datasets and trust in
authors to ensure the completeness and quality of their datasets. We highlight
problematic aspects of these thematic responses and offer potential starting
points for improvement of the public data archiving process.Comment: 35 pages, 1 figure, 1 tabl
Detection of Review Abuse via Semi-Supervised Binary Multi-Target Tensor Decomposition
Product reviews and ratings on e-commerce websites provide customers with
detailed insights about various aspects of the product such as quality,
usefulness, etc. Since they influence customers' buying decisions, product
reviews have become a fertile ground for abuse by sellers (colluding with
reviewers) to promote their own products or to tarnish the reputation of
competitor's products. In this paper, our focus is on detecting such abusive
entities (both sellers and reviewers) by applying tensor decomposition on the
product reviews data. While tensor decomposition is mostly unsupervised, we
formulate our problem as a semi-supervised binary multi-target tensor
decomposition, to take advantage of currently known abusive entities. We
empirically show that our multi-target semi-supervised model achieves higher
precision and recall in detecting abusive entities as compared to unsupervised
techniques. Finally, we show that our proposed stochastic partial natural
gradient inference for our model empirically achieves faster convergence than
stochastic gradient and Online-EM with sufficient statistics.Comment: Accepted to the 25th ACM SIGKDD Conference on Knowledge Discovery and
Data Mining, 2019. Contains supplementary material. arXiv admin note: text
overlap with arXiv:1804.0383
Diverse Weighted Bipartite b-Matching
Bipartite matching, where agents on one side of a market are matched to
agents or items on the other, is a classical problem in computer science and
economics, with widespread application in healthcare, education, advertising,
and general resource allocation. A practitioner's goal is typically to maximize
a matching market's economic efficiency, possibly subject to some fairness
requirements that promote equal access to resources. A natural balancing act
exists between fairness and efficiency in matching markets, and has been the
subject of much research.
In this paper, we study a complementary goal---balancing diversity and
efficiency---in a generalization of bipartite matching where agents on one side
of the market can be matched to sets of agents on the other. Adapting a
classical definition of the diversity of a set, we propose a quadratic
programming-based approach to solving a supermodular minimization problem that
balances diversity and total weight of the solution. We also provide a scalable
greedy algorithm with theoretical performance bounds. We then define the price
of diversity, a measure of the efficiency loss due to enforcing diversity, and
give a worst-case theoretical bound. Finally, we demonstrate the efficacy of
our methods on three real-world datasets, and show that the price of diversity
is not bad in practice
How Can We Change Our Habits If We Don’t Talk About Them?
For the late nineteenth century pragmatists, habits were of great interest. Habits, and the habit of changing habits, they believed, reflected if not defined human rationality, leadingWilliam James to describe habit as “the enormous fly-wheel of society.” What the pragmatists did not adequately address (at least for us) is the role of power relations in the process of changing habits. In this article we discuss our experience of attempting to engage critique and reflection on habitual practices in music teacher education, offering the reader an article within an article. That is, we reflect on our failure to publish a critical article in a widely read practitioner journal by sharing the original manuscript and its reviews, with the hope that our experience might shed additional light on social reproduction and efforts aimed at change
Quantifying the quality of peer reviewers through Zipf's law
This paper introduces a statistical and other analysis of peer reviewers in
order to approach their "quality" through some quantification measure, thereby
leading to some quality metrics. Peer reviewer reports for the Journal of the
Serbian Chemical Society are examined. The text of each report has first to be
adapted to word counting software in order to avoid jargon inducing confusion
when searching for the word frequency: e.g. C must be distinguished, depending
if it means Carbon or Celsius, etc. Thus, every report has to be carefully
"rewritten". Thereafter, the quantity, variety and distribution of words are
examined in each report and compared to the whole set. Two separate months,
according when reports came in, are distinguished to observe any possible
hidden spurious effects. Coherence is found. An empirical distribution is
searched for through a Zipf-Pareto rank-size law. It is observed that peer
review reports are very far from usual texts in this respect. Deviations from
the usual (first) Zipf's law are discussed. A theoretical suggestion for the
"best (or worst) report" and by extension "good (or bad) reviewer", within this
context, is provided from an entropy argument, through the concept of "distance
to average" behavior. Another entropy-based measure also allows to measure the
journal reviews (whence reviewers) for further comparison with other journals
through their own reviewer reports.Comment: 28 pages; 8 Tables; 9 Figures; 39 references; prepared for and to be
published in Scientometric
- …