210 research outputs found

    Prepare for VoIP Spam

    Get PDF

    A survey on opinion spam detection methods

    Get PDF
    Since the past decade, fake Reviews also known as Opinion spam has plagued the e-commerce sector around the world. Opinion spam is considered extremely harmful as it can be used to control the sentiment of a product or service, which in turn can be used to damage the sales and reputation of a company. Throughout the years, extensive research has used Natural language processing for extracting textual features and use them with various machine learning algorithms for opinion spam detection. Majority of the reviewed literature has focused on supervised learning techniques using artificially crafted datasets. The purpose of this paper is twofold: to analyze the various machine learning techniques that have been proposed in the extant literature for detecting opinion spam and compare their accuracies, to provide further insights for future researchers in the field of opinion spam detection. This survey has concluded that semi-supervised techniques using multi-aspect features of reviews, reviewers, and products can provide a better result in spam detection. Furthermore, the lack of accurately labeled datasets presents a major challenge in the field of Fake review detection

    Fake Account Identification Using Machine Learning Approaches Integrated with Adaptive Particle Swarm Optimization

    Get PDF
     It is customary for humans, bots, and other automated systems to generate new user accounts by utilizing pilfered or otherwise deceitful personal information. They are employed in deceitful activities such as phishing and identity theft, as well as in spreading damaging rumors. An somebody with malevolent intent may generate a substantial number of counterfeit accounts, ranging from hundreds to thousands, with the aim of disseminating their harmful actions to as many authentic users as possible. Users can get a wealth of knowledge from social networking networks. Malicious individuals are readily encouraged to take use of this vast collection of social media information. These cybercriminals fabricate fictitious identities and disseminate meaningless stuff. An essential aspect of using social media networks is the process of discerning counterfeit profiles. This study presents a machine learning approach to detect fraudulent Instagram profiles. This strategy employed the attribute-selection technique, adaptive particle swarm optimization, and feature-elimination recursion. The results indicate that the suggested adaptive particle swarm optimization method surpasses RFE in terms of accuracy, recall, and F measure

    Graph Mining for Cybersecurity: A Survey

    Full text link
    The explosive growth of cyber attacks nowadays, such as malware, spam, and intrusions, caused severe consequences on society. Securing cyberspace has become an utmost concern for organizations and governments. Traditional Machine Learning (ML) based methods are extensively used in detecting cyber threats, but they hardly model the correlations between real-world cyber entities. In recent years, with the proliferation of graph mining techniques, many researchers investigated these techniques for capturing correlations between cyber entities and achieving high performance. It is imperative to summarize existing graph-based cybersecurity solutions to provide a guide for future studies. Therefore, as a key contribution of this paper, we provide a comprehensive review of graph mining for cybersecurity, including an overview of cybersecurity tasks, the typical graph mining techniques, and the general process of applying them to cybersecurity, as well as various solutions for different cybersecurity tasks. For each task, we probe into relevant methods and highlight the graph types, graph approaches, and task levels in their modeling. Furthermore, we collect open datasets and toolkits for graph-based cybersecurity. Finally, we outlook the potential directions of this field for future research

    Classifying spam emails using agglomerative hierarchical clustering and a topic-based approach

    Get PDF
    [EN] Spam emails are unsolicited, annoying and sometimes harmful messages which may contain malware, phishing or hoaxes. Unlike most studies that address the design of efficient anti-spam filters, we approach the spam email problem from a different and novel perspective. Focusing on the needs of cybersecurity units, we follow a topic-based approach for addressing the classification of spam email into multiple categories. We propose SPEMC-15K-E and SPEMC-15K-S, two novel datasets with approximately 15K emails each in English and Spanish, respectively, and we label them using agglomerative hierarchical clustering into 11 classes. We evaluate 16 pipelines, combining four text representation techniques -Term Frequency-Inverse Document Frequency (TF-IDF), Bag of Words, Word2Vec and BERT- and four classifiers: Support Vector Machine, Näive Bayes, Random Forest and Logistic Regression. Experimental results show that the highest performance is achieved with TF-IDF and LR for the English dataset, with a F1 score of 0.953 and an accuracy of 94.6%, and while for the Spanish dataset, TF-IDF with NB yields a F1 score of 0.945 and 98.5% accuracy. Regarding the processing time, TF-IDF with LR leads to the fastest classification, processing an English and Spanish spam email in 2ms and 2.2ms on average, respectively.S

    Review Manipulation: Literature Review, and Future Research Agenda

    Get PDF
    Background: The phenomenon of review manipulation and fake reviews has gained Information Systems (IS) scholars’ attention during recent years. Scholarly research in this domain has delved into the causes and consequences of review manipulation. However, we find that the findings are diverse, and the studies do not portray a systematic approach. This study synthesizes the findings from a multidisciplinary perspective and presents an integrated framework to understand the mechanism of review manipulation. Method: The study reviews 88 relevant articles on review manipulation spanning a decade and a half. We adopted an iterative coding approach to synthesizing the literature on concepts and categorized them independently into potential themes. Results: We present an integrated framework that shows the linkages between the different themes, namely, the prevalence of manipulation, impact of manipulation, conditions and choice for manipulation decision, characteristics of fake reviews, models for detecting spam reviews, and strategies to deal with manipulation. We also present the characteristics of review manipulation and cover both operational and conceptual issues associated with the research on this topic. Conclusions: Insights from the study will guide future research on review manipulation and fake reviews. The study presents a holistic view of the phenomenon of review manipulation. It informs various online platforms to address fake reviews towards building a healthy and sustainable environment

    Online Misinformation: Challenges and Future Directions

    Get PDF
    Misinformation has become a common part of our digital media environments and it is compromising the ability of our societies to form informed opinions. It generates misperceptions, which have affected the decision making processes in many domains, including economy, health, environment, and elections, among others. Misinformation and its generation, propagation, impact, and management is being studied through a variety of lenses (computer science, social science, journalism, psychology, etc.) since it widely affects multiple aspects of society. In this paper we analyse the phenomenon of misinformation from a technological point of view.We study the current socio-technical advancements towards addressing the problem, identify some of the key limitations of current technologies, and propose some ideas to target such limitations. The goal of this position paper is to reflect on the current state of the art and to stimulate discussions on the future design and development of algorithms, methodologies, and applications

    A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and Weighted Swarm Support Vector Machines

    Get PDF
    Online reviews are important information that customers seek when deciding to buy products or services. Also, organizations benefit from these reviews as essential feedback for their products or services. Such information required reliability, especially during the Covid-19 pandemic which showed a massive increase in online reviews due to quarantine and sitting at home. Not only the number of reviews was boosted but also the context and preferences during the pandemic. Therefore, spam reviewers reflect on these changes and improve their deception technique. Spam reviews usually consist of misleading, fake, or fraudulent reviews that tend to deceive customers for the purpose of making money or causing harm to other competitors. Hence, this work presents a Weighted Support Vector Machine (WSVM) and Harris Hawks Optimization (HHO) for spam review detection. The HHO works as an algorithm for optimizing hyperparameters and feature weighting. Three different language corpora have been used as datasets, namely English, Spanish, and Arabic in order to solve the multilingual problem in spam reviews. Moreover, pre-trained word embedding (BERT) has been applied alongside three-word representation methods (NGram-3, TFIDF, and One-hot encoding). Four experiments have been conducted, each focused on solving and demonstrating different aspects. In all experiments, the proposed approach showed excellent results compared with other state-ofthe- art algorithms. In other words, the WSVM-HHO achieved an accuracy of 88.163%, 71.913%, 89.565%, and 84.270%, for English, Spanish, Arabic, and Multilingual datasets, respectively. Further, a deep analysis has been conducted to investigate the context of reviews before and after the COVID-19 situation. In addition, it has been generated to create a new dataset with statistical features and merge its previous textual features for improving detection performance.Projects TED2021-129938B-I0,PID2020-113462RB-I00, PDC2022-133900-I00PID2020-115570GB-C22, granted by Ministerio Español de Ciencia e InnovaciónMCIN/AEI/10.13039/501100011033MCIN/AEI/10.13039/501100011033MCIN/AEINext GenerationEU/PRT
    corecore