Search CORE

2,309 research outputs found

Email classification using data reduction method

Author: Islam Rafiqul
Xiang Yang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Classifying user emails correctly from penetration of spam is an important research issue for anti-spam researchers. This paper has presented an effective and efficient email classification technique based on data filtering method. In our testing we have introduced an innovative filtering technique using instance selection method (ISM) to reduce the pointless data instances from training model and then classify the test data. The objective of ISM is to identify which instances (examples, patterns) in email corpora should be selected as representatives of the entire dataset, without significant loss of information. We have used WEKA interface in our integrated classification model and tested diverse classification algorithms. Our empirical studies show significant performance in terms of classification accuracy with reduction of false positive instances.<br /

Deakin Research Online

E-mail Spam Filtering by A New Hybrid Feature Selection Method Using Chi2 as Filter and Random Tree as Wrapper

Author: Pourhashemi Seyed Mostafa
Publication venue: 'Faculty of Engineering, Chulalongkorn University'
Publication date: 10/07/2014
Field of study

The purpose of this research is presenting a machine learning approach for enhancing the accuracy of automatic spam detecting and filtering and separating them from legitimate messages. In this regard, for reducing the error rate and increasing the efficiency, the hybrid architecture on feature selection has been used. Features used in these systems, are the body of text messages. Proposed system of this research has used the combination of two filtering models, Filter and Wrapper, with Chi Squared (Chi2) filter and Random Tree wrapper as feature selectors. In addition, Multinomial Naïve Bayes (MNB) classifier, Discriminative Multinomial Naïve Bayes (DMNB) classifier, Support Vector Machine (SVM) classifier and Random Forest classifier are used for classification. Finally, the output results of this classifiers and feature selection methods are examined and the best design is selected and it is compared with another similar works by considering different parameters. The optimal accuracy of the proposed system is evaluated equal to 99%

Crossref

Engineering Journal (Faculty of Engineering, Chulalongkorn University, Bangkok)

Email classification via intention-based segmentation

Author: Adiyarta Krisna
Agarwal Sonali
Sonbhadra Sanjay Kumar
Syafrullah Muhammad
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 23/11/2020
Field of study

Email is the most popular way of personal and official communication among people and organizations. Due to untrusted virtual environment, email systems may face frequent attacks like malware, spamming, social engineering, etc. Spamming is the most common malicious activity, where unsolicited emails are sent in bulk, and these spam emails can be the source of malware, waste resources, hence degrade the productivity. In spam filter development, the most important challenge is to find the correlation between the nature of spam and the interest of the users because the interests of users are dynamic. This paper proposes a novel dynamic spam filter model that considers the changes in the interests of users with time while handling the spam activities. It uses intention-based segmentation to compare different segments of text documents instead of comparing them as a whole. The proposed spam filter is a multi-tier approach where initially, the email content is divided into segments with the help of part of speech (POS) tagging based on voices and tenses. Further, the segments are clustered using hierarchical clustering and compared using the vector space model. In the third stage, concept drift is detected in the clusters to identify the change in the interest of the user. Later, the classification of ham emails into various categories is done in the last stage. For experiments Enron dataset is used and the obtained results are promising

Proceeding of the Electrical Engineering Computer Science and Informatics