5 research outputs found

    Identification of Opinion Spammers using Reviewer Reputation and Clustering Analysis

    Get PDF
    Online reviews have increasingly become a very important resource before making a purchasing decisions. Unfortunately, malicious sellers try to game the system by hiring a person or team (which is called spammers) to fabricate fake reviews to improve their reputation.Existing methods mainly take the problem as a general binary classification or focus on some heuristic rules. However, supervised learning methods relies heavily on a large number of labeled examples of deceptive and truthful opinions by domain experts, and most of features mentioned in the heuristic strategy ignore the characteristic of the group organization among spammers. In this paper, an effective method of identifying opinion spammers is proposed. Firstly, suspected spammers are detected by means of unsupervised learning based on reviewer’s reputation. We believe that the reviewer’s reputation has a direct relation with the quality of reviews. Generally, review written by user with lower reputation, shows lower quality and higher possibility to be fake. Therefore, the model assigns reputation score to each reviewer wherein the content based factors and activeness of reviewers are employed efficiently. On basis of all suspected spammers, k-center clustering algorithm is performed to further spot the spammers based on the observation of burst of review release time. Experimental results on Amazon’s dataset are encouraging and indicate that our approach poses high accuracy and recall, and good performance is achieved

    Evaluation of data mining features, features taxonomies and their applications

    Get PDF
    The World Wide Web has brought an enormous improvement in the lives of people, during the last couple of decades. E-commerce is a new area arisen during this evolutionary period and has changed the traditional trading approaches for selling products and services. It uses different techniques to discover a market trend and analyze the competitor’s activities by exploiting reviews’ information. On the other hand, potential customers, also, use the online opinion to make their purchase decision. Opinion mining and sentiment analysis are the most critical and fundamental domains of data mining which can be useful for variety its sub-domains such as opinion summarization, recommendation system and opinion spam detection. Opinion mining and all its sub-branches can be performed efficiently when there is a comprehensive understanding of the most effective features applied in those domains. To achieve the best results, we need to use the most proper set of features for different case studies in order to classification or clustering. To the best of our knowledge, there is no extensive study and taxonomy of variety range of features and their applications in opinion mining. In this paper, we do comprehensive investigation on various types of features exploited in variety sub-branches of opinion mining domain. We present the most frequent features’ sets including structural, linguistic and relation-based features as a complete reference for further opinion mining research. The results proved that using multiple types of features improve the accuracy of opinion mining applications

    Understanding the impacts and motivations of duplicate reviews on tripadvisor

    Get PDF
    This work was supported by national funds through FCT (Fundação para a Ciência e a Tecnologia), under the projects - UIDB/04152/2020 - Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS and UIDB/04470/2020 - Centro de Investigação, Desenvolvimento e Inovação em Turismo - CiTUR.TripAdvisor is a popular review platform, where users post reviews for the same place, including duplicate reviews. This duplication can skew research results and visitors’ perceptions. To address this issue, we analyze TripAdvisor reviews in 3 languages from 20 attractions in 2 UNESCO heritage-listed cities. We identify 3 types of motivations for multiple reviews: hedonic, utilitarian, and publishing issues. Our study recommends that online review platforms implement strategies to mitigate this and advises researchers to on how to overcome duplicate reviews in their research. ---- TripAdvisor es una plataforma de reseñas, donde los usuarios publican reseñas para el mismo lugar, incluidas las reseñas duplicadas. Esta duplicación puede sesgar los resultados de la investigación y las percepciones de los visitantes. Para abordar este problema, analizamos las reseñas de TripAdvisor en 3 idiomas de 20 atracciones en 2 ciudades declaradas Patrimonio de la Humanidad por la UNESCO. Identificamos 3 tipos de motivaciones para las revisiones múltiples: cuestiones hedónicas, utilitarias y editoriales. Nuestro estudio recomienda que las plataformas de revisión en línea implementen estrategias para mitigar esto y asesora a los investigadores sobre cómo superar las revisiones duplicadas en su investigación.publishersversionpublishe