38 research outputs found

    Survey of review spam detection using machine learning techniques

    Get PDF

    Social Media Fake Account Detection for Afan Oromo Language using Machine Learning

    Get PDF
    A social networking service serves as a platform to build social networks or social relations among people who, share interests, activities, backgrounds, or real life connections. A social network service is generally offered to participants who registers to this site with their unique representation (often a profile) and one’s social links. Most social network services are web-based and provide means for users to interact over the Internet. (M. Smruthi, , February 2019).Online social networking sites became an important means in our daily life. Millions of users register and share personal information with others. Because of the fast expansion of social networks, public may exploit them for unprincipled and illegitimate activities. As a result of this, privacy threats and disclosing personal information have become the most important issues to the users of social networking sites. The intent of creating fake profiles have become an adversary effect and difficult to detect such identities/malicious content without appropriate research. The current research that have been developed for detecting malicious content, primarily considered the characteristics of user profile. Most of the existing techniques lack comprehensive evaluation. In this work we propose new model using machine learning and NLP (Natural Language Processing) techniques to enhance the accuracy rate in detecting the fake identities in online social networks. We would like to apply this approach to Facebook by extracting the features like Time, date of publication, language, and geo position. (Srinivas Rao Pulluri1, A Comprehensive Model for Detecting Fake Profiles in Online Social Networks, 2017) DOI: 10.7176/NMMC/90-01 Publication date:May 31st 2020

    Sentiment Classification Using Supervised and Unsupervised Approach

    Get PDF
    In past few years, the data available on internet has multiplied at an alarming rate. Tweets, reviews, blogs and comments on social media have been a huge factor which has resulted in such a huge amount of increase in the available data. Because of this datasets being highly unstructured and of high dimensionality, sentiment classification becomes a very tiresome task. Sentiment Analysis is used to estimate the user opinion on various issues. It consequently mines states of mind and perspectives of clients on particular issues. It�s a multistep preparation where choosing and extracting elements is an indispensable stride that controls execution of sentiment classifier. In this paper we have used three supervised techniques namely SVM, Decision Tree and Nave Bays Algorithm and three unsupervised techniques called DE, PSO and K-Means The results are validated using different three benchmark labeled datasets data sets and on the different feature sets We have also performed feature selection using genetic algorithm and validated results using the features selected by the GA Experimental results shows that supervised techniques have outperformed supervised techniques on one dataset while for the two datasets supervised techniques have outperformed unsupervised technique

    A Framework to Categorize Shill and Normal Reviews by Measuring it’s Linguistic Features

    Get PDF
    Shill reviews detection has attracted significant attention from both business and research communities. Shill reviews are increasingly used to influence the reputation of products sold on websites in positive or negative manner. The spammers may create shill reviews which mislead readers to artificially promote or devalue some target products or services. Different methods which work according to linguistic features have been adopted and implemented effectively. Surprisingly, review manipulation was found on reputable e-commerce websites also. This is the reason why linguistic-feature based methods have gained more and more popularity. Lingual features of shill reviews are examined in this study and then a tool has been developed for extracting product features from the text used in the product review under analysis. Fake reviews, fake comments, fake blogs, fake social network postings and deceptive texts are some forms of shill reviews. By extracting linguistic features like informativeness, subjectivity and readability, an attempt is made to find difference between shill and normal reviews. On the basis of these three characteristics, hypotheses are formed and generalized. These hypotheses help to compare shill and normal reviews in analytical terms. Proposed work is for based on polarity of the text (positive or negative), as shill reviewer tend to use a definite polarity based on their intention, positive or negative

    Goal-oriented Email Stream Classifier with A Multi-agent System Approach

    Get PDF
    Now-a-days, email is often one of the most widely used means of communication despite the rise of other communication methods such as instant messaging or communication via social networks. The need to automate the email stream management increases for reasons such as multi-folder categorization, and spam email classification. There are solutions based on email content, capable of contemplating elements such as the text subjective nature, adverse effects of concept drift, among others. This paper presents an email stream classifier with a goal-oriented approach to client and server environment. The i* language was the basis for designing the proposed email stream classifier. The email environment was represented with the early requirements model and the proposed classifier with the late requirements model. The classifier was implemented following a multi-agent system approach supported by JADE agent platform and Implementation_JADE pattern. The behavior of agents was taking from an existing classifier. The multi-agent classifier was evaluated using functional, efficacy and performance tests, which compared the existing classifier with the multi-agent approach. The results obtained were satisfactory in all the tests. The performance of multi-agent approach was better than the existing classifier due to the use of multi-threads.This work was performed as part of the Smart University Project financed by the University of Alicante

    Detection of Review Abuse via Semi-Supervised Binary Multi-Target Tensor Decomposition

    Full text link
    Product reviews and ratings on e-commerce websites provide customers with detailed insights about various aspects of the product such as quality, usefulness, etc. Since they influence customers' buying decisions, product reviews have become a fertile ground for abuse by sellers (colluding with reviewers) to promote their own products or to tarnish the reputation of competitor's products. In this paper, our focus is on detecting such abusive entities (both sellers and reviewers) by applying tensor decomposition on the product reviews data. While tensor decomposition is mostly unsupervised, we formulate our problem as a semi-supervised binary multi-target tensor decomposition, to take advantage of currently known abusive entities. We empirically show that our multi-target semi-supervised model achieves higher precision and recall in detecting abusive entities as compared to unsupervised techniques. Finally, we show that our proposed stochastic partial natural gradient inference for our model empirically achieves faster convergence than stochastic gradient and Online-EM with sufficient statistics.Comment: Accepted to the 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2019. Contains supplementary material. arXiv admin note: text overlap with arXiv:1804.0383

    A Semi-Supervised Learning Approach for Tackling Twitter Spam Drift

    Get PDF
    Twitter has changed the way people get information by allowing them to express their opinion and comments on the daily tweets. Unfortunately, due to the high popularity of Twitter, it has become very attractive to spammers. Unlike other types of spam, Twitter spam has become a serious issue in the last few years. The large number of users and the high amount of information being shared on Twitter play an important role in accelerating the spread of spam. In order to protect the users, Twitter and the research community have been developing different spam detection systems by applying different machine-learning techniques. However, a recent study showed that the current machine learning-based detection systems are not able to detect spam accurately because spam tweet characteristics vary over time. This issue is called “Twitter Spam Drift”. In this paper, a semi-supervised learning approach (SSLA) has been proposed to tackle this. The new approach uses the unlabeled data to learn the structure of the domain. Different experiments were performed on English and Arabic datasets to test and evaluate the proposed approach and the results show that the proposed SSLA can reduce the effect of Twitter spam drift and outperform the existing techniques
    corecore