Search CORE

7,851 research outputs found

Improving spam filtering by combining Naive Bayes with simple k-nearest neighbor searches

Author: Etzold Daniel
Publication venue
Publication date: 01/01/2003
Field of study

Using naive Bayes for email classification has become very popular within the last few months. They are quite easy to implement and very efficient. In this paper we want to present empirical results of email classification using a combination of naive Bayes and k-nearest neighbor searches. Using this technique we show that the accuracy of a Bayes filter can be improved slightly for a high number of features and significantly for a small number of features

arXiv.org e-Print Archive

CiteSeerX

A Graph-Based Semi-Supervised k Nearest-Neighbor Method for Nonlinear Manifold Distributed Data Classification

Author: Kasabov Nikola
Tu Enmei
Yang Jie
Zhang Yaqian
Zhu Lin
Publication venue
Publication date: 03/06/2016
Field of study

k

Nearest Neighbors (

k

NN) is one of the most widely used supervised learning algorithms to classify Gaussian distributed data, but it does not achieve good results when it is applied to nonlinear manifold distributed data, especially when a very limited amount of labeled samples are available. In this paper, we propose a new graph-based

k

NN algorithm which can effectively handle both Gaussian distributed data and nonlinear manifold distributed data. To achieve this goal, we first propose a constrained Tired Random Walk (TRW) by constructing an

R

-level nearest-neighbor strengthened tree over the graph, and then compute a TRW matrix for similarity measurement purposes. After this, the nearest neighbors are identified according to the TRW matrix and the class label of a query point is determined by the sum of all the TRW weights of its nearest neighbors. To deal with online situations, we also propose a new algorithm to handle sequential samples based a local neighborhood reconstruction. Comparison experiments are conducted on both synthetic data sets and real-world data sets to demonstrate the validity of the proposed new

k

NN algorithm and its improvements to other version of

k

NN algorithms. Given the widespread appearance of manifold structures in real-world problems and the popularity of the traditional

k

NN algorithm, the proposed manifold version

k

NN shows promising potential for classifying manifold-distributed data.Comment: 32 pages, 12 figures, 7 table

arXiv.org e-Print Archive

AUT Scholarly Commons

Stacking classifiers for anti-spam filtering of e-mail

Author: Androutsopoulos I.
Karkaletsis V.
Paliouras G.
Sakkis G.
Spyropoulos C. D.
Stamatopoulos P.
Publication venue
Publication date: 01/01/2001
Field of study

We evaluate empirically a scheme for combining classifiers, known as stacked generalization, in the context of anti-spam filtering, a novel cost-sensitive application of text categorization. Unsolicited commercial e-mail, or "spam", floods mailboxes, causing frustration, wasting bandwidth, and exposing minors to unsuitable content. Using a public corpus, we show that stacking can improve the efficiency of automatically induced anti-spam filters, and that such filters can be used in real-life applications

arXiv.org e-Print Archive

CiteSeerX