Search CORE

261 research outputs found

Using online linear classifiers to filter spam Emails

Author: Jones Gareth J.F.
Wang Bin
Wenfeng Pan
Publication venue: Springer Verlag
Publication date: 01/11/2007
Field of study

The performance of two online linear classifiers - the Perceptron and Littlestone’s Winnow – is explored for two anti-spam filtering benchmark corpora - PU1 and Ling-Spam. We study the performance for varying numbers of features, along with three different feature selection methods: Information Gain (IG), Document Frequency (DF) and Odds Ratio. The size of the training set and the number of training iterations are also investigated for both classifiers. The experimental results show that both the Perceptron and Winnow perform much better when using IG or DF than using Odds Ratio. It is further demonstrated that when using IG or DF, the classifiers are insensitive to the number of features and the number of training iterations, and not greatly sensitive to the size of training set. Winnow is shown to slightly outperform the Perceptron. It is also demonstrated that both of these online classifiers perform much better than a standard Naïve Bayes method. The theoretical and implementation computational complexity of these two classifiers are very low, and they are very easily adaptively updated. They outperform most of the published results, while being significantly easier to train and adapt. The analysis and promising experimental results indicate that the Perceptron and Winnow are two very competitive classifiers for anti-spam filtering

Irish Universities

DCU Online Research Access Service

Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach

Author: Androutsopoulos Ion
Karkaletsis Vangelis
Paliouras Georgios
Sakkis Georgios
Spyropoulos Constantine D.
Stamatopoulos Panagiotis
Publication venue
Publication date: 01/01/2000
Field of study

We investigate the performance of two machine learning algorithms in the context of anti-spam filtering. The increasing volume of unsolicited bulk e-mail (spam) has generated a need for reliable anti-spam filters. Filters of this type have so far been based mostly on keyword patterns that are constructed by hand and perform poorly. The Naive Bayesian classifier has recently been suggested as an effective method to construct automatically anti-spam filters with superior performance. We investigate thoroughly the performance of the Naive Bayesian filter on a publicly available corpus, contributing towards standard benchmarks. At the same time, we compare the performance of the Naive Bayesian filter to an alternative memory-based learning approach, after introducing suitable cost-sensitive evaluation measures. Both methods achieve very accurate spam filtering, outperforming clearly the keyword-based filter of a widely used e-mail reader

arXiv.org e-Print Archive

CiteSeerX

Feature extraction and classification of spam emails

Author: Hassan Muhammad Ali
Mtetwa Nhamo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/05/2019
Field of study

Crossref

ResearchOnline@GCU

On the use of Locality for Improving SVM-Based Spam Filtering

Author: Longe O. B.
Ojo F. O.
Okesola J. O.
Publication venue
Publication date: 01/01/2015
Field of study

Recent growths in the use of email for communication and the corresponding growths in the volume of email received have made automatic processing of emails desirable. In tandem is the prevailing problem of Advance Fee fraud E-mails that pervades inboxes globally. These genres of e-mails solicit for financial transactions and funds transfers from unsuspecting users. Most modern mail-reading software packages provide some forms of programmable automatic filtering, typically in the form of sets of rules that file or otherwise dispose mails based on keywords detected in the headers or message body. Unfortunately programming these filters is an arcane and sometimes inefficient process. An adaptive mail system which can learn its users’ mail sorting preferences would therefore be more desirable. Premised on the work of Blanzieri & Bryl (2007), we proposes a framework dedicated to the phenomenon of locality in email data analysis of advance fee fraud e-mails which engages Support Vector Machines (SVM) classifier for building local decision rules into the classification process of the spam filter design for this genre of e-mails

Covenant University Repository

An Approach to Email Classification Using Bayesian Theorem

Author: Denil Vira
Dr. Denil Vira
Pradeep Raja
Publication venue: Global Journals Inc. (US)
Publication date: 07/06/2012
Field of study

Email Classifiers based on Bayesian theorem have been very effective in Spam filtering due to their strong categorization ability and high precision. This paper proposes an algorithm for email classification based on Bayesian theorem. The purpose is to automatically classify mails into predefined categories. The algorithm assigns an incoming mail to its appropriate category by checking its textual contents. The experimental results depict that the proposed algorithm is reasonable and effective method for email classification

Global Journal of Computer Science and Technology (GJCST)

Learning to Filter Text in Forum Malay Message using Naive Bayesian Technique

Author: Ab. Halim Norhadila
Publication venue: Universiti Teknologi Petronas
Publication date: 01/01/2006
Field of study

Applying the basic filtering technique in forum application has been discussed in [I]. The paper explains about me use of the basic naive Bayesian algorithm to classify forum messages whether clean or bad where clean message has no bad words, while bad message contains at least one bad word. In this Final Year Project paper, the application ofthe algorithm in the filtering forum messages will be discussed in the attempt to apply learning to filter forum messages

UTPedia