5 research outputs found

    E-mail Spam Filtering by A New Hybrid Feature Selection Method Using Chi2 as Filter and Random Tree as Wrapper

    Get PDF
    The purpose of this research is presenting a machine learning approach for enhancing the accuracy of automatic spam detecting and filtering and separating them from legitimate messages. In this regard, for reducing the error rate and increasing the efficiency, the hybrid architecture on feature selection has been used. Features used in these systems, are the body of text messages. Proposed system of this research has used the combination of two filtering models, Filter and Wrapper, with Chi Squared (Chi2) filter and Random Tree wrapper as feature selectors. In addition, Multinomial Naïve Bayes (MNB) classifier, Discriminative Multinomial Naïve Bayes (DMNB) classifier, Support Vector Machine (SVM) classifier and Random Forest classifier are used for classification. Finally, the output results of this classifiers and feature selection methods are examined and the best design is selected and it is compared with another similar works by considering different parameters. The optimal accuracy of the proposed system is evaluated equal to 99%

    An Efficient feature selection algorithm for the spam email classification

    Get PDF
    The existing spam email classification systems are suffering from the problems of low accuracy due to the high dimensionality of the associated feature selection (FS) process. But being a global optimization process in machine learning, FS is mainly aimed at reducing the redundancy of dataset to create a set of acceptable and accurate results. This study presents the combination of Chaotic Particle Swarm Optimization (PSO) algorithm with Artificial Bees Colony (ABC) for the reduction of features dimensionality in a bid to improve spam emails classification accuracy. The features for each particle in this work were represented in a binary form, meaning that they were transformed into binary using a sigmoid function. The features selection was based on a fitness function that depended on the obtained accuracy using SVM. The proposed system was evaluated for performance by considering the performance of the classifier and the selected features vectors dimension which served as the input to the classifier; this evaluation was done using the Spam Base dataset and from the results, the PSO-ABC classifier performed well in terms of FS even with a small set of selected features

    Hybrid Spam Filtering using Monarch Butterfly Optimization Algorithm with Self-Adaptive Population

    Get PDF
    Spam causes bottlenecks and congestion, reducing the speed, processing power, available memory, and bandwidth. Existing spam email classification methods need to be more accurate because of the large dimensionality of hybrid spam datasets. This makes the need for a feature dimensionality reduction technique that uses only associated features of the problem instead of all features in the dataset. This paper presents a feature selection based on the monarch butterfly optimization (MBO) algorithm that emphasizes less complexity and few features. This method is efficient and produces a more accurate classification. To improve further standard MBO algorithm performance, we introduce the population size in both subpopulations 1 and 2 will experience dynamic variations as the algorithm proceeds along its linear way. As the idea of a self-adaptive and greedy strategy is modified, the self-adaptive population monarch butterfly optimization (SPMBO) method is introduced, and only newly generated SPMBO individuals are eligible for the next generations if they are better individuals earlier before. Later, this paper proposes an email classification system based on k-nearest neighbors (k-NN) based on two distance metrics, explicitly Euclidean, and Manhattan, that also uses the SPMBO technique. This method seeks to determine whether a hybrid email is a spam. The efficiency of the proposed SPMBO algorithm was compared with standard MBO based on three datasets Dredze, Image spam hunter, and Spambase. Thus, the use of SPMBO results has shown superior as related to other authors' works in relevant fields

    Email Filtering Using Hybrid Feature Selection Model

    Get PDF

    Hybrid Email Spam Detection Model Using Artificial Intelligence

    Get PDF
    The growing volume of spam Emails has generated the need for a more precise anti-spam filter to detect unsolicited Emails. One of the most common representations used in spam filters is the Bag-of-Words (BOW). Although BOW is very effective in the classification of the emails, it has a number of weaknesses. In this paper, we present a hybrid approach to spam filtering based on the Neural Network model Paragraph Vector-Distributed Memory (PV-DM). We use PV-DM to build up a compact representation of the context of an email and also of its pertinent features. This methodology represents a more comprehensive filter for classifying Emails. Furthermore, we have conducted an empirical experiment using Enron spam and Ling spam datasets, the results of which indicate that our proposed filter outperforms the PV-DM and the BOW email classification methods