95,707 research outputs found
On the use of Locality for Improving SVM-Based Spam Filtering
Recent growths in the use of email for communication and the corresponding growths in the volume of email received have made automatic processing of emails desirable. In tandem is the prevailing problem of Advance Fee fraud E-mails that pervades inboxes globally. These genres of e-mails solicit for financial transactions and funds transfers from unsuspecting users. Most modern mail-reading software packages provide some forms of programmable automatic filtering, typically in the form of sets of rules that file or otherwise dispose mails based on keywords detected in the headers or message body. Unfortunately programming these filters is an arcane and sometimes inefficient process. An adaptive mail system which can learn its users’ mail sorting preferences would therefore be more desirable. Premised on the work of Blanzieri & Bryl (2007), we proposes a framework dedicated to the phenomenon of locality in email data analysis of advance fee fraud e-mails which engages Support Vector Machines (SVM) classifier for building local decision rules into the classification process of the spam filter design for this genre of e-mails
Reservoir of Diverse Adaptive Learners and Stacking Fast Hoeffding Drift Detection Methods for Evolving Data Streams
The last decade has seen a surge of interest in adaptive learning algorithms
for data stream classification, with applications ranging from predicting ozone
level peaks, learning stock market indicators, to detecting computer security
violations. In addition, a number of methods have been developed to detect
concept drifts in these streams. Consider a scenario where we have a number of
classifiers with diverse learning styles and different drift detectors.
Intuitively, the current 'best' (classifier, detector) pair is application
dependent and may change as a result of the stream evolution. Our research
builds on this observation. We introduce the \mbox{Tornado} framework that
implements a reservoir of diverse classifiers, together with a variety of drift
detection algorithms. In our framework, all (classifier, detector) pairs
proceed, in parallel, to construct models against the evolving data streams. At
any point in time, we select the pair which currently yields the best
performance. We further incorporate two novel stacking-based drift detection
methods, namely the \mbox{FHDDMS} and \mbox{FHDDMS}_{add} approaches. The
experimental evaluation confirms that the current 'best' (classifier, detector)
pair is not only heavily dependent on the characteristics of the stream, but
also that this selection evolves as the stream flows. Further, our
\mbox{FHDDMS} variants detect concept drifts accurately in a timely fashion
while outperforming the state-of-the-art.Comment: 42 pages, and 14 figure
Variable selection for the multicategory SVM via adaptive sup-norm regularization
The Support Vector Machine (SVM) is a popular classification paradigm in
machine learning and has achieved great success in real applications. However,
the standard SVM can not select variables automatically and therefore its
solution typically utilizes all the input variables without discrimination.
This makes it difficult to identify important predictor variables, which is
often one of the primary goals in data analysis. In this paper, we propose two
novel types of regularization in the context of the multicategory SVM (MSVM)
for simultaneous classification and variable selection. The MSVM generally
requires estimation of multiple discriminating functions and applies the argmax
rule for prediction. For each individual variable, we propose to characterize
its importance by the supnorm of its coefficient vector associated with
different functions, and then minimize the MSVM hinge loss function subject to
a penalty on the sum of supnorms. To further improve the supnorm penalty, we
propose the adaptive regularization, which allows different weights imposed on
different variables according to their relative importance. Both types of
regularization automate variable selection in the process of building
classifiers, and lead to sparse multi-classifiers with enhanced
interpretability and improved accuracy, especially for high dimensional low
sample size data. One big advantage of the supnorm penalty is its easy
implementation via standard linear programming. Several simulated examples and
one real gene data analysis demonstrate the outstanding performance of the
adaptive supnorm penalty in various data settings.Comment: Published in at http://dx.doi.org/10.1214/08-EJS122 the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Spam on the Internet: can it be eradicated or is it here to stay?
A discussion of the rise in unsolicited bulk e-mail, its effect on tertiary education, and some of the methods being used or developed to combat it. Includes an examination of block listing, protocol change, economic and computational solutions, e-mail aliasing, sender warranted e-mail, collaborative filtering, rule-based and statistical solutions, and legislation
Model Order Selection Rules For Covariance Structure Classification
The adaptive classification of the interference covariance matrix structure
for radar signal processing applications is addressed in this paper. This
represents a key issue because many detection architectures are synthesized
assuming a specific covariance structure which may not necessarily coincide
with the actual one due to the joint action of the system and environment
uncertainties. The considered classification problem is cast in terms of a
multiple hypotheses test with some nested alternatives and the theory of Model
Order Selection (MOS) is exploited to devise suitable decision rules. Several
MOS techniques, such as the Akaike, Takeuchi, and Bayesian information criteria
are adopted and the corresponding merits and drawbacks are discussed. At the
analysis stage, illustrating examples for the probability of correct model
selection are presented showing the effectiveness of the proposed rules
- …