30 research outputs found

    Is writing style predictive of scientific fraud?

    Get PDF
    The problem of detecting scientific fraud using machine learning was recently introduced, with initial, positive results from a model taking into account various general indicators. The results seem to suggest that writing style is predictive of scientific fraud. We revisit these initial experiments, and show that the leave-one-out testing procedure they used likely leads to a slight over-estimate of the predictability, but also that simple models can outperform their proposed model by some margin. We go on to explore more abstract linguistic features, such as linguistic complexity and discourse structure, only to obtain negative results. Upon analyzing our models, we do see some interesting patterns, though: Scientific fraud, for examples, contains less comparison, as well as different types of hedging and ways of presenting logical reasoning.Comment: To appear in the Proceedings of the Workshop on Stylistic Variation 2017 (EMNLP), 6 page

    Detecting Singleton Review Spammers Using Semantic Similarity

    Full text link
    Online reviews have increasingly become a very important resource for consumers when making purchases. Though it is becoming more and more difficult for people to make well-informed buying decisions without being deceived by fake reviews. Prior works on the opinion spam problem mostly considered classifying fake reviews using behavioral user patterns. They focused on prolific users who write more than a couple of reviews, discarding one-time reviewers. The number of singleton reviewers however is expected to be high for many review websites. While behavioral patterns are effective when dealing with elite users, for one-time reviewers, the review text needs to be exploited. In this paper we tackle the problem of detecting fake reviews written by the same person using multiple names, posting each review under a different name. We propose two methods to detect similar reviews and show the results generally outperform the vectorial similarity measures used in prior works. The first method extends the semantic similarity between words to the reviews level. The second method is based on topic modeling and exploits the similarity of the reviews topic distributions using two models: bag-of-words and bag-of-opinion-phrases. The experiments were conducted on reviews from three different datasets: Yelp (57K reviews), Trustpilot (9K reviews) and Ott dataset (800 reviews).Comment: 6 pages, WWW 201

    SRC Model to Identify Beguiling Reviews

    Get PDF
    Today, e-trade sites are giving colossal number of a platform to clients in which they can express their perspectives,  their suppositions and post their audits about the items on the web. Such substance helped by clients is accessible for different clients and makers as a significant wellspring of data.  This data is useful in taking imperative business choices.  Despite the fact that this data impact the purchasing choice of a client, however quality control on this client created information is not guaranteed, as audit area is an open stage accessible to all. anybody  can  compose  anything  on  web  which may incorporate surveys which are not true. as the prevalence of e-commerce destinations are hugely expanding, nature of the surveys is deteriorating step by step subsequently influencing clients’ purchasing choices. This has turned into an enormous social issue.  From numerous years, email spam and web spam were the two primary highlighted social issues. at the same time these days, because of notoriety of clients’ enthusiasm toward internet shopping and their reliance on the online audits, it turned into a real focus for audit spammers to delude clients by composing sham surveys for target items. To the best of our insight, very little study is accounted for in regards to this issue reliability of online reviews. To begin with paper was distributed in 2007 by NITIN  JINDAL  &  BING  LIU in regards to  review Spam detection.  In the past few years, variety of techniques has been recommended by researchers to accord with this trouble. This paper intends to introduce Suspicious review Classifier model (SrC) for identifying suspicious review, review spammers and their group

    Unsupervised user behavior representation for fraud review detection with cold-start problem

    Full text link
    © Springer Nature Switzerland AG 2019. Detecting fraud review is becoming extremely important in order to provide reliable information in cyberspace, in which, however, handling cold-start problem is a critical and urgent challenge since the case of cold-start fraud review rarely provides sufficient information for further assessing its authenticity. Existing work on detecting cold-start cases relies on the limited contents of the review posted by the user and a traditional classifier to make the decision. However, simply modeling review is not reliable since reviews can be easily manipulated. Also, it is hard to obtain high-quality labeled data for training the classifier. In this paper, we tackle cold-start problems by (1) using a user’s behavior representation rather than review contents to measure authenticity, which further (2) consider user social relations with other existing users when posting reviews. The method is completely (3) unsupervised. Comprehensive experiments on Yelp data sets demonstrate our method significantly outperforms the state-of-the-art methods

    Review Spam Detection Using Machine Learning Techniques

    Get PDF
    Nowadays with the increasing popularity of internet, online marketing is going to become more and more popular. This is because, a lot of products and services are easily available online. Hence, reviews about these all products and services are very important for customers as well as organizations. Unfortunately, driven by the will for profit or promotion, fraudsters used to produce fake reviews. These fake reviews written by fraudsters prevent customers and organizations reaching actual conclusions about the products. Hence, fake reviews or review spam must be detected and eliminated so as to prevent deceptive potential customers. In our work, supervised and semi-supervised learning technique have been applied to detect review spam. The most apt data sets in the research area of review spam detection has been used in proposed work. For supervised learning, we try to obtain some feature sets from different automated approaches such as LIWC, POS Tagging, N-gram etc., that can best distinguish the spam and non-spam reviews. Along with these features sentiment analysis, data mining and opinion mining technique have also been applied. For semi-supervised learning, PU-learning algorithm is being used along with six different classifiers (Decision Tree, Naive Bayes, Support Vector Machine, k-Nearest Neighbor, Random Forest, Logistic Regression) to detect review spam from the available data set. Finally, a comparison of proposed technique with some existing review spam detection techniques has been done

    Detecting collusive spamming activities in community question answering

    Get PDF
    Community Question Answering (CQA) portals provide rich sources of information on a variety of topics. However, the authenticity and quality of questions and answers (Q&As) has proven hard to control. In a troubling direction, the widespread growth of crowdsourcing websites has created a large-scale, potentially difficult-to-detect workforce to manipulate malicious contents in CQA. The crowd workers who join the same crowdsourcing task about promotion campaigns in CQA collusively manipulate deceptive Q&As for promoting a target (product or service). The collusive spamming group can fully control the sentiment of the target. How to utilize the structure and the attributes for detecting manipulated Q&As? How to detect the collusive group and leverage the group information for the detection task? To shed light on these research questions, we propose a unified framework to tackle the challenge of detecting collusive spamming activities of CQA. First, we interpret the questions and answers in CQA as two independent networks. Second, we detect collusive question groups and answer groups from these two networks respectively by measuring the similarity of the contents posted within a short duration. Third, using attributes (individual-level and group-level) and correlations (user-based and content-based), we proposed a combined factor graph model to detect deceptive Q&As simultaneously by combining two independent factor graphs. With a large-scale practical data set, we find that the proposed framework can detect deceptive contents at early stage, and outperforms a number of competitive baselines

    Turning the Tide: Curbing Deceptive Yelp Behaviors

    Full text link
    The popularity and influence of reviews, make sites like Yelp ideal targets for malicious behaviors. We present Marco, a novel system that exploits the unique combination of social, spatial and temporal signals gleaned from Yelp, to detect venues whose ratings are impacted by fraudulent reviews. Marco increases the cost and complexity of attacks, by imposing a tradeoff on fraudsters, between their ability to impact venue ratings and their ability to remain undetected. We contribute a new dataset to the community, which consists of both ground truth and gold standard data. We show that Marco significantly outperforms state-of-the-art approaches, by achieving 94 % accuracy in classifying reviews as fraudulent or genuine, and 95.8 % accuracy in classifying venues as deceptive or legitimate. Marco successfully flagged 244 deceptive venues from our large dataset with 7,435 venues, 270,121 reviews and 195,417 users. Among the San Francisco car repair and moving companies that we analyzed, almost 10 % exhibit fraudulent behaviors.
    corecore