Search CORE

991 research outputs found

Comparative between optimization feature selection by using classifiers algorithms on spam email

Author: Bakar Zuriana Binti Abu
Mamat Rabiei
Rahim Noor Hafhizah Abd
Rawashdeh Ghada
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/12/2019
Field of study

Spam mail has become a rising phenomenon in a world that has recently witnessed high growth in the volume of emails. This indicates the need to develop an effective spam filter. At the present time, Classification algorithms for text mining are used for the classification of emails. This paper provides a description and evaluation of the effectiveness of three popular classifiers using optimization feature selections, such as Genetic algorithm, Harmony search, practical swarm optimization, and simulating annealing. The research focuses on a comparison of the effect of classifiers using K-nearest Neighbor (KNN), Naïve Bayesian (NB), and Support Vector Machine (SVM) on spam classifiers (without using feature selection) also enhances the reliability of feature selection by proposing optimization feature selection to reduce number of features that are not important

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Email classification using data reduction method

Author: Islam Rafiqul
Xiang Yang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Classifying user emails correctly from penetration of spam is an important research issue for anti-spam researchers. This paper has presented an effective and efficient email classification technique based on data filtering method. In our testing we have introduced an innovative filtering technique using instance selection method (ISM) to reduce the pointless data instances from training model and then classify the test data. The objective of ISM is to identify which instances (examples, patterns) in email corpora should be selected as representatives of the entire dataset, without significant loss of information. We have used WEKA interface in our integrated classification model and tested diverse classification algorithms. Our empirical studies show significant performance in terms of classification accuracy with reduction of false positive instances.<br /

Deakin Research Online

Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning

Author: Biggio Battista
Roli Fabio
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Learning-based pattern classifiers, including deep networks, have shown impressive performance in several application domains, ranging from computer vision to cybersecurity. However, it has also been shown that adversarial input perturbations carefully crafted either at training or at test time can easily subvert their predictions. The vulnerability of machine learning to such wild patterns (also referred to as adversarial examples), along with the design of suitable countermeasures, have been investigated in the research field of adversarial machine learning. In this work, we provide a thorough overview of the evolution of this research area over the last ten years and beyond, starting from pioneering, earlier work on the security of non-deep learning algorithms up to more recent work aimed to understand the security properties of deep learning algorithms, in the context of computer vision and cybersecurity tasks. We report interesting connections between these apparently-different lines of work, highlighting common misconceptions related to the security evaluation of machine-learning algorithms. We review the main threat models and attacks defined to this end, and discuss the main limitations of current work, along with the corresponding future challenges towards the design of more secure learning algorithms.Comment: Accepted for publication on Pattern Recognition, 201

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Cagliari

Archivio istituzionale della ricerca - Università di Genova

A Survey of Existing E-mail Spam Filtering Methods Considering Machine Learning Techniques

Author: Jinat Ara
Publication venue: Global Journals Inc. (US)
Publication date: 15/03/2018
Field of study

E-mail is one of the most secure medium for online communication and transferring data or messages through the web. An overgrowing increase in popularity, the number of unsolicited data has also increased rapidly. To filtering data, different approaches exist which automatically detect and remove these untenable messages. There are several numbers of email spam filtering technique such as Knowledge-based technique, Clustering techniques, Learningbased technique, Heuristic processes and so on. This paper illustrates a survey of different existing email spam filtering system regarding Machine Learning Technique (MLT) such as Naive Bayes, SVM, K-Nearest Neighbor, Bayes Additive Regression, KNN Tree, and rules. However, here we present the classification, evaluation and comparison of different email spam filtering system and summarize the overall scenario regarding accuracy rate of different existing approache

Global Journal of Computer Science and Technology (GJCST)

Detecting Spam Email With Machine Learning Optimized With Bio-Inspired Metaheuristic Algorithms

Author: Gibson Simran
Issac Biju
Jacob Seibu Mary
Zhang Li
Publication venue: IEEE
Publication date: 13/10/2020
Field of study

Electronic mail has eased communication methods for many organisations as well as individuals. This method is exploited for fraudulent gain by spammers through sending unsolicited emails. This article aims to present a method for detection of spam emails with machine learning algorithms that are optimized with bio-inspired methods. A literature review is carried to explore the efficient methods applied on different datasets to achieve good results. An extensive research was done to implement machine learning models using Naïve Bayes, Support Vector Machine, Random Forest, Decision Tree and Multi-Layer Perceptron on seven different email datasets, along with feature extraction and pre-processing. The bio-inspired algorithms like Particle Swarm Optimization and Genetic Algorithm were implemented to optimize the performance of classifiers. Multinomial Naïve Bayes with Genetic Algorithm performed the best overall. The comparison of our results with other machine learning and bio-inspired models to show the best suitable model is also discussed

Northumbria Research Link

Teeside University's Research Repository

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Author: Alaiz Rodríguez Rocío
Alegre Gutiérrez Enrique
González Castro Víctor
Jáñez-Martino Francisco
López Fidalgo Eduardo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/06/2022
Field of study

.Spam emails have been traditionally seen as just annoying and unsolicited emails containing advertisements, but they increasingly include scams, malware or phishing. In order to ensure the security and integrity for the users, organisations and researchers aim to develop robust filters for spam email detection. Recently, most spam filters based on machine learning algorithms published in academic journals report very high performance, but users are still reporting a rising number of frauds and attacks via spam emails. Two main challenges can be found in this field: (a) it is a very dynamic environment prone to the dataset shift problem and (b) it suffers from the presence of an adversarial figure, i.e. the spammer. Unlike classical spam email reviews, this one is particularly focused on the problems that this constantly changing environment poses. Moreover, we analyse the different spammer strategies used for contaminating the emails, and we review the state-of-the-art techniques to develop filters based on machine learning. Finally, we empirically evaluate and present the consequences of ignoring the matter of dataset shift in this practical field. Experimental results show that this shift may lead to severe degradation in the estimated generalisation performance, with error rates reaching values up to 48.81%.SIPublicación en abierto financiada por el Consorcio de Bibliotecas Universitarias de Castilla y León (BUCLE), con cargo al Programa Operativo 2014ES16RFOP009 FEDER 2014-2020 DE CASTILLA Y LEÓN, Actuación:20007-CL - Apoyo Consorcio BUCL

Leon University (Spain)