261 research outputs found
Using online linear classifiers to filter spam Emails
The performance of two online linear classifiers - the Perceptron and Littlestoneās Winnow ā is explored for two anti-spam filtering benchmark corpora - PU1 and Ling-Spam. We study the performance for varying numbers of features, along with three different feature selection methods: Information Gain (IG), Document Frequency (DF) and Odds Ratio. The size of the training set and the number of training iterations are also investigated for both classifiers. The experimental results show that both the Perceptron and Winnow perform much better when using IG or DF than using Odds Ratio. It is further demonstrated that when using IG or DF, the classifiers are insensitive to the number of features and the number of training iterations, and not greatly sensitive to the size of training set. Winnow is shown to slightly outperform the Perceptron. It is also demonstrated that both of these online classifiers perform much better than a standard NaĆÆve Bayes method. The theoretical and implementation computational complexity of these two classifiers are very low, and they are very easily adaptively updated. They outperform most of the published results, while being significantly easier to train and adapt. The analysis and promising experimental results indicate that the Perceptron and Winnow are two very competitive classifiers for anti-spam filtering
Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach
We investigate the performance of two machine learning algorithms in the
context of anti-spam filtering. The increasing volume of unsolicited bulk
e-mail (spam) has generated a need for reliable anti-spam filters. Filters of
this type have so far been based mostly on keyword patterns that are
constructed by hand and perform poorly. The Naive Bayesian classifier has
recently been suggested as an effective method to construct automatically
anti-spam filters with superior performance. We investigate thoroughly the
performance of the Naive Bayesian filter on a publicly available corpus,
contributing towards standard benchmarks. At the same time, we compare the
performance of the Naive Bayesian filter to an alternative memory-based
learning approach, after introducing suitable cost-sensitive evaluation
measures. Both methods achieve very accurate spam filtering, outperforming
clearly the keyword-based filter of a widely used e-mail reader
On the use of Locality for Improving SVM-Based Spam Filtering
Recent growths in the use of email for communication and the corresponding growths in the volume of email received have made automatic processing of emails desirable. In tandem is the prevailing problem of Advance Fee fraud E-mails that pervades inboxes globally. These genres of e-mails solicit for financial transactions and funds transfers from unsuspecting users. Most modern mail-reading software packages provide some forms of programmable automatic filtering, typically in the form of sets of rules that file or otherwise dispose mails based on keywords detected in the headers or message body. Unfortunately programming these filters is an arcane and sometimes inefficient process. An adaptive mail system which can learn its usersā mail sorting preferences would therefore be more desirable. Premised on the work of Blanzieri & Bryl (2007), we proposes a framework dedicated to the phenomenon of locality in email data analysis of advance fee fraud e-mails which engages Support Vector Machines (SVM) classifier for building local decision rules into the classification process of the spam filter design for this genre of e-mails
An Approach to Email Classification Using Bayesian Theorem
Email Classifiers based on Bayesian theorem have been very effective in Spam filtering due to their strong categorization ability and high precision. This paper proposes an algorithm for email classification based on Bayesian theorem. The purpose is to automatically classify mails into predefined categories. The algorithm assigns an incoming mail to its appropriate category by checking its textual contents. The experimental results depict that the proposed algorithm is reasonable and effective method for email classification
Learning to Filter Text in Forum Malay Message using Naive Bayesian Technique
Applying the basic filtering technique in forum application has been discussed in [I]. The
paper explains about me use of the basic naive Bayesian algorithm to classify forum
messages whether clean or bad where clean message has no bad words, while bad
message contains at least one bad word. In this Final Year Project paper, the application
ofthe algorithm in the filtering forum messages will be discussed in the attempt to apply
learning to filter forum messages
- ā¦