499 research outputs found
Online Deception Detection Refueled by Real World Data Collection
The lack of large realistic datasets presents a bottleneck in online
deception detection studies. In this paper, we apply a data collection method
based on social network analysis to quickly identify high-quality deceptive and
truthful online reviews from Amazon. The dataset contains more than 10,000
deceptive reviews and is diverse in product domains and reviewers. Using this
dataset, we explore effective general features for online deception detection
that perform well across domains. We demonstrate that with generalized features
- advertising speak and writing complexity scores - deception detection
performance can be further improved by adding additional deceptive reviews from
assorted domains in training. Finally, reviewer level evaluation gives an
interesting insight into different deceptive reviewers' writing styles.Comment: 10 pages, Accepted to Recent Advances in Natural Language Processing
(RANLP) 201
Man vs machine – Detecting deception in online reviews
This study focused on three main research objectives: analyzing the methods used to identify deceptive online consumer reviews, evaluating insights provided by multi-method automated approaches based on individual and aggregated review data, and formulating a review interpretation framework for identifying deception. The theoretical framework is based on two critical deception-related models, information manipulation theory and self-presentation theory. The findings confirm the interchangeable characteristics of the various automated text analysis methods in drawing insights about review characteristics and underline their significant complementary aspects. An integrative multi-method model that approaches the data at the individual and aggregate level provides more complex insights regarding the quantity and quality of review information, sentiment, cues about its relevance and contextual information, perceptual aspects, and cognitive material
Detecting Deceptive Opinions: Intra and Cross-domain Classification using an Efficient Representation
Electronic versĂon of an article published as International Journal of Uncertainty Fuzziness and Knowledge-Based Systems, 25, 2, 2017, 151-174. DOI:10.1142/S0218488517400165 © copyright World Scientific Publishing Company. https://www.worldscientific.com/worldscinet/ijufks[EN] Online opinions play an important role for customers and companies because of the increasing use they do to make purchase and business decisions. A consequence of that is the growing tendency to post fake reviews in order to change purchase decisions and opinions about products and services. Therefore, it is really important to filter out deceptive comments from the retrieved opinions. In this paper we propose the character n-grams in tokens, an efficient and effective variant of the traditional character n-grams model, which we use to obtain a low dimensionality representation of opinions. A Support Vector Machines classifier was used to evaluate our proposal on available corpora with reviews of hotels, doctors and restaurants. In order to study the performance of our model, we make experiments with intra and cross-domain cases. The aim of the latter experiment is to evaluate our approach in a realistic cross-domain scenario where deceptive opinions are available in a domain but not in another one. After comparing our method with state-of-the-art ones we may conclude that using character n-grams in tokens allows to obtain competitive results with a low dimensionality representation.This publication was made possible by NPRP grant #9-175-1-033 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.Cagnina, L.; Rosso, P. (2017). Detecting Deceptive Opinions: Intra and Cross-domain Classification using an Efficient Representation. International Journal of Uncertainty Fuzziness and Knowledge-Based Systems. 25(2):151-174. https://doi.org/10.1142/S0218488517400165S151174252Cambria, E. (2016). Affective Computing and Sentiment Analysis. IEEE Intelligent Systems, 31(2), 102-107. doi:10.1109/mis.2016.31Cambria, E., & Hussain, A. (2015). Sentic Computing. Cognitive Computation, 7(2), 183-185. doi:10.1007/s12559-015-9325-0Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software. ACM SIGKDD Explorations Newsletter, 11(1), 10-18. doi:10.1145/1656274.1656278Hancock, J. T., Curry, L. E., Goorha, S., & Woodworth, M. (2007). On Lying and Being Lied To: A Linguistic Analysis of Deception in Computer-Mediated Communication. Discourse Processes, 45(1), 1-23. doi:10.1080/01638530701739181Hernández Fusilier, D., Montes-y-GĂłmez, M., Rosso, P., & Guzmán Cabrera, R. (2015). Detecting positive and negative deceptive opinions using PU-learning. Information Processing & Management, 51(4), 433-443. doi:10.1016/j.ipm.2014.11.001Mann, H. B., & Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics, 18(1), 50-60. doi:10.1214/aoms/1177730491MONTAÑÉS, E., QUEVEDO, J. R., COMBARRO, E. F., DĂŤAZ, I., & RANILLA, J. (2007). A HYBRID FEATURE SELECTION METHOD FOR TEXT CATEGORIZATION. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 15(02), 133-151. doi:10.1142/s0218488507004492Newman, M. L., Pennebaker, J. W., Berry, D. S., & Richards, J. M. (2003). Lying Words: Predicting Deception from Linguistic Styles. Personality and Social Psychology Bulletin, 29(5), 665-675. doi:10.1177/0146167203029005010Raudys, S. J., & Jain, A. K. (1991). Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(3), 252-264. doi:10.1109/34.75512Wang, G., Xie, S., Liu, B., & Yu, P. S. (2012). Identify Online Store Review Spammers via Social Review Graph. ACM Transactions on Intelligent Systems and Technology, 3(4), 1-21. doi:10.1145/2337542.2337546Webb, G. I. (2000). Machine Learning, 40(2), 159-196. doi:10.1023/a:100765951484
Automated Crowdturfing Attacks and Defenses in Online Review Systems
Malicious crowdsourcing forums are gaining traction as sources of spreading
misinformation online, but are limited by the costs of hiring and managing
human workers. In this paper, we identify a new class of attacks that leverage
deep learning language models (Recurrent Neural Networks or RNNs) to automate
the generation of fake online reviews for products and services. Not only are
these attacks cheap and therefore more scalable, but they can control rate of
content output to eliminate the signature burstiness that makes crowdsourced
campaigns easy to detect.
Using Yelp reviews as an example platform, we show how a two phased review
generation and customization attack can produce reviews that are
indistinguishable by state-of-the-art statistical detectors. We conduct a
survey-based user study to show these reviews not only evade human detection,
but also score high on "usefulness" metrics by users. Finally, we develop novel
automated defenses against these attacks, by leveraging the lossy
transformation introduced by the RNN training and generation cycle. We consider
countermeasures against our mechanisms, show that they produce unattractive
cost-benefit tradeoffs for attackers, and that they can be further curtailed by
simple constraints imposed by online service providers
- …