13,092 research outputs found

    Network Spam To Create A Arrangement Intended For Online Public Reviews

    Get PDF
    The ability for anyone to leave a comment offers a golden opportunity for spammers to write spam reviews of products and services for a variety of interests. Using the importance of spam functions helps us perform better in terms of various metrics tested on real-world review data sets from Yelp and Amazon. Identifying spammers and spam is a hot topic of research, and although a large number of studies have recently been conducted for this purpose, the methodologies presented so far barely detect spam reviews and none have demonstrated the importance of each type of extracted feature. . In this study, we propose a new framework, called Network Spam that uses spam properties to model review data sets as heterogeneous information networks to assign a spam detection procedure to the classification problem in those networks. The results show that the spam network outperforms existing methods and four classes of characteristics; including behavior review, user behavior, language review, user language, and the first type of features work better than other categories

    Detecting Spam Game Reviews on Steam with a Semi-Supervised Approach

    Get PDF
    The potential value of online reviews has led to more and more spam reviews appearing on the web. These spam reviews are widely distributed, harmful, and difficult to identify manually. In this paper, we explore and implement generalised approaches for identifying online deceptive spam game reviews from Steam. We analyse spam game reviews and present and validate some techniques to detect them. In addition, we aim to identify the unique features of game reviews and to create a labelled game review dataset based on different features. We were able to create a labelled dataset that can be used to identify spam game reviews in future research. Our method resulted in 5,021 of the 33,450 unlabelled Steam reviews being labelled as spam reviews, or approximately 15%. This falls within the expected range of 10-20% and maps to the Yelp figures of 14-20% of reviews are spam

    Detecting Fake Reviews: Just a Matter of Data

    Get PDF
    Along with the ever-increasing portfolio of products online, the incentive for market participants to write fake reviews to gain a competitive edge has increased as well. This article demonstrates the effectiveness of using different combinations of spam detection features to detect fake reviews other than the review-based features typically used. Using a spectrum of feature sets offers greater accuracy in identifying fake reviews than using review-based features only, and using a machine learning algorithm for classification and different amounts of feature sets further elucidates the difference in performance. Results compared by benchmarking show that applying a technique prioritizing feature importance benefits from prioritizing features from multiple feature sets and that creating feature sets based on reviews, reviewers and product data can achieve the greatest accuracy

    Online Deception Detection Refueled by Real World Data Collection

    Full text link
    The lack of large realistic datasets presents a bottleneck in online deception detection studies. In this paper, we apply a data collection method based on social network analysis to quickly identify high-quality deceptive and truthful online reviews from Amazon. The dataset contains more than 10,000 deceptive reviews and is diverse in product domains and reviewers. Using this dataset, we explore effective general features for online deception detection that perform well across domains. We demonstrate that with generalized features - advertising speak and writing complexity scores - deception detection performance can be further improved by adding additional deceptive reviews from assorted domains in training. Finally, reviewer level evaluation gives an interesting insight into different deceptive reviewers' writing styles.Comment: 10 pages, Accepted to Recent Advances in Natural Language Processing (RANLP) 201

    Detecting Sockpuppets in Deceptive Opinion Spam

    Full text link
    This paper explores the problem of sockpuppet detection in deceptive opinion spam using authorship attribution and verification approaches. Two methods are explored. The first is a feature subsampling scheme that uses the KL-Divergence on stylistic language models of an author to find discriminative features. The second is a transduction scheme, spy induction that leverages the diversity of authors in the unlabeled test set by sending a set of spies (positive samples) from the training set to retrieve hidden samples in the unlabeled test set using nearest and farthest neighbors. Experiments using ground truth sockpuppet data show the effectiveness of the proposed schemes.Comment: 18 pages, Accepted at CICLing 2017, 18th International Conference on Intelligent Text Processing and Computational Linguistic
    corecore