Search CORE

210,537 research outputs found

Online Deception Detection Refueled by Real World Data Collection

Author: Caverlee James
Dai Zeyu
Huang Ruihong
Yao Wenlin
Publication venue
Publication date: 28/07/2017
Field of study

The lack of large realistic datasets presents a bottleneck in online deception detection studies. In this paper, we apply a data collection method based on social network analysis to quickly identify high-quality deceptive and truthful online reviews from Amazon. The dataset contains more than 10,000 deceptive reviews and is diverse in product domains and reviewers. Using this dataset, we explore effective general features for online deception detection that perform well across domains. We demonstrate that with generalized features - advertising speak and writing complexity scores - deception detection performance can be further improved by adding additional deceptive reviews from assorted domains in training. Finally, reviewer level evaluation gives an interesting insight into different deceptive reviewers' writing styles.Comment: 10 pages, Accepted to Recent Advances in Natural Language Processing (RANLP) 201

arXiv.org e-Print Archive

Crossref

Enforcing public data archiving policies in academic publishing: A study of ecology journals

Author: Boettiger Carl
Katz Daniel S.
Ram Karthik
Sholler Dan
Publication venue
Publication date: 30/10/2018
Field of study

To improve the quality and efficiency of research, groups within the scientific community seek to exploit the value of data sharing. Funders, institutions, and specialist organizations are developing and implementing strategies to encourage or mandate data sharing within and across disciplines, with varying degrees of success. Academic journals in ecology and evolution have adopted several types of public data archiving policies requiring authors to make data underlying scholarly manuscripts freely available. Yet anecdotes from the community and studies evaluating data availability suggest that these policies have not obtained the desired effects, both in terms of quantity and quality of available datasets. We conducted a qualitative, interview-based study with journal editorial staff and other stakeholders in the academic publishing process to examine how journals enforce data archiving policies. We specifically sought to establish who editors and other stakeholders perceive as responsible for ensuring data completeness and quality in the peer review process. Our analysis revealed little consensus with regard to how data archiving policies should be enforced and who should hold authors accountable for dataset submissions. Themes in interviewee responses included hopefulness that reviewers would take the initiative to review datasets and trust in authors to ensure the completeness and quality of their datasets. We highlight problematic aspects of these thematic responses and offer potential starting points for improvement of the public data archiving process.Comment: 35 pages, 1 figure, 1 tabl

arXiv.org e-Print Archive

Directory of Open Access Journals

Detection of Review Abuse via Semi-Supervised Binary Multi-Target Tensor Decomposition

Author: Feng S.
Hooi B.
Hu C.
Li H.
Rai P.
Rai P.
Ye J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/05/2019
Field of study

Product reviews and ratings on e-commerce websites provide customers with detailed insights about various aspects of the product such as quality, usefulness, etc. Since they influence customers' buying decisions, product reviews have become a fertile ground for abuse by sellers (colluding with reviewers) to promote their own products or to tarnish the reputation of competitor's products. In this paper, our focus is on detecting such abusive entities (both sellers and reviewers) by applying tensor decomposition on the product reviews data. While tensor decomposition is mostly unsupervised, we formulate our problem as a semi-supervised binary multi-target tensor decomposition, to take advantage of currently known abusive entities. We empirically show that our multi-target semi-supervised model achieves higher precision and recall in detecting abusive entities as compared to unsupervised techniques. Finally, we show that our proposed stochastic partial natural gradient inference for our model empirically achieves faster convergence than stochastic gradient and Online-EM with sufficient statistics.Comment: Accepted to the 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2019. Contains supplementary material. arXiv admin note: text overlap with arXiv:1804.0383

arXiv.org e-Print Archive

Crossref

Diverse Weighted Bipartite b-Matching

Author: Ahmed Faez
Dickerson John P.
Fuge Mark
Publication venue: 'International Joint Conferences on Artificial Intelligence'
Publication date: 15/08/2017
Field of study

Bipartite matching, where agents on one side of a market are matched to agents or items on the other, is a classical problem in computer science and economics, with widespread application in healthcare, education, advertising, and general resource allocation. A practitioner's goal is typically to maximize a matching market's economic efficiency, possibly subject to some fairness requirements that promote equal access to resources. A natural balancing act exists between fairness and efficiency in matching markets, and has been the subject of much research. In this paper, we study a complementary goal---balancing diversity and efficiency---in a generalization of bipartite matching where agents on one side of the market can be matched to sets of agents on the other. Adapting a classical definition of the diversity of a set, we propose a quadratic programming-based approach to solving a supermodular minimization problem that balances diversity and total weight of the solution. We also provide a scalable greedy algorithm with theoretical performance bounds. We then define the price of diversity, a measure of the efficiency loss due to enforcing diversity, and give a worst-case theoretical bound. Finally, we demonstrate the efficacy of our methods on three real-world datasets, and show that the price of diversity is not bad in practice

arXiv.org e-Print Archive

Crossref

How Can We Change Our Habits If We Don’t Talk About Them?

Author: Mantie Roger
Talbot Brent C.
Publication venue: The Cupola: Scholarship at Gettysburg College
Publication date: 01/04/2015
Field of study

For the late nineteenth century pragmatists, habits were of great interest. Habits, and the habit of changing habits, they believed, reflected if not defined human rationality, leadingWilliam James to describe habit as “the enormous fly-wheel of society.” What the pragmatists did not adequately address (at least for us) is the role of power relations in the process of changing habits. In this article we discuss our experience of attempting to engage critique and reflection on habitual practices in music teacher education, offering the reader an article within an article. That is, we reflect on our failure to publish a critical article in a widely read practitioner journal by sharing the original manuscript and its reviews, with the hope that our experience might shed additional light on social reproduction and efforts aimed at change

Gettysburg College

Quantifying the quality of peer reviewers through Zipf's law

Author: Ausloos Marcel
Fronczak Agata
Fronczak Piotr
Nedic Olgica
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/08/2015
Field of study

This paper introduces a statistical and other analysis of peer reviewers in order to approach their "quality" through some quantification measure, thereby leading to some quality metrics. Peer reviewer reports for the Journal of the Serbian Chemical Society are examined. The text of each report has first to be adapted to word counting software in order to avoid jargon inducing confusion when searching for the word frequency: e.g. C must be distinguished, depending if it means Carbon or Celsius, etc. Thus, every report has to be carefully "rewritten". Thereafter, the quantity, variety and distribution of words are examined in each report and compared to the whole set. Two separate months, according when reports came in, are distinguished to observe any possible hidden spurious effects. Coherence is found. An empirical distribution is searched for through a Zipf-Pareto rank-size law. It is observed that peer review reports are very far from usual texts in this respect. Deviations from the usual (first) Zipf's law are discussed. A theoretical suggestion for the "best (or worst) report" and by extension "good (or bad) reviewer", within this context, is provided from an entropy argument, through the concept of "distance to average" behavior. Another entropy-based measure also allows to measure the journal reviews (whence reviewers) for further comparison with other journals through their own reviewer reports.Comment: 28 pages; 8 Tables; 9 Figures; 39 references; prepared for and to be published in Scientometric

arXiv.org e-Print Archive

Crossref

Leicester Research Archive