and

E. Michelakis; Eirinaios Michelakis; G. Paliouras; Georgios Paliouras; I. Androutsopoulos

and

Authors: E. Michelakis
Eirinaios Michelakis
G. Paliouras
Georgios Paliouras
I. Androutsopoulos
Publication date
Publisher

Abstract

We present a thorough investigation on using machine learning to construct effective personalized anti-spam filters. The investigation includes four learning algorithms, Naive Bayes, Flexible Bayes, LogitBoost, and Support Vector Machines, and four datasets, constructed from the mailboxes of different users. We discuss the model and search biases of the learning algorithms, along with worst-case computational complexity figures, and observe how the latter relate to experimental measurements. We study how classification accuracy is affected when using attributes that represent sequences of tokens, as opposed to single tokens, and explore the effect of the size of the attribute and training set, all within a cost-sensitive framework. Furthermore, we describe the architecture of a fully implemented learning-based anti-spam filter, and present an analysis of its behavior in real use over a period of seven months. Information is also provided on other available learning-based anti-spam filters, and alternative filtering approaches

Similar works

Full text

Available Versions

CiteSeerX

oai:CiteSeerX.psu:10.1.1.76.21...

Last time updated on 22/10/2014