82 research outputs found
Robust Classification for Imprecise Environments
In real-world environments it usually is difficult to specify target
operating conditions precisely, for example, target misclassification costs.
This uncertainty makes building robust classification systems problematic. We
show that it is possible to build a hybrid classifier that will perform at
least as well as the best available classifier for any target conditions. In
some cases, the performance of the hybrid actually can surpass that of the best
known classifier. This robust performance extends across a wide variety of
comparison frameworks, including the optimization of metrics such as accuracy,
expected cost, lift, precision, recall, and workforce utilization. The hybrid
also is efficient to build, to store, and to update. The hybrid is based on a
method for the comparison of classifier performance that is robust to imprecise
class distributions and misclassification costs. The ROC convex hull (ROCCH)
method combines techniques from ROC analysis, decision analysis and
computational geometry, and adapts them to the particulars of analyzing learned
classifiers. The method is efficient and incremental, minimizes the management
of classifier performance data, and allows for clear visual comparisons and
sensitivity analyses. Finally, we point to empirical evidence that a robust
hybrid classifier indeed is needed for many real-world problems.Comment: 24 pages, 12 figures. To be published in Machine Learning Journal.
For related papers, see http://www.hpl.hp.com/personal/Tom_Fawcett/ROCCH
Feature Selection for MAUC-Oriented Classification Systems
Feature selection is an important pre-processing step for many pattern
classification tasks. Traditionally, feature selection methods are designed to
obtain a feature subset that can lead to high classification accuracy. However,
classification accuracy has recently been shown to be an inappropriate
performance metric of classification systems in many cases. Instead, the Area
Under the receiver operating characteristic Curve (AUC) and its multi-class
extension, MAUC, have been proved to be better alternatives. Hence, the target
of classification system design is gradually shifting from seeking a system
with the maximum classification accuracy to obtaining a system with the maximum
AUC/MAUC. Previous investigations have shown that traditional feature selection
methods need to be modified to cope with this new objective. These methods most
often are restricted to binary classification problems only. In this study, a
filter feature selection method, namely MAUC Decomposition based Feature
Selection (MDFS), is proposed for multi-class classification problems. To the
best of our knowledge, MDFS is the first method specifically designed to select
features for building classification systems with maximum MAUC. Extensive
empirical results demonstrate the advantage of MDFS over several compared
feature selection methods.Comment: A journal length pape
Supervised Classification: Quite a Brief Overview
The original problem of supervised classification considers the task of
automatically assigning objects to their respective classes on the basis of
numerical measurements derived from these objects. Classifiers are the tools
that implement the actual functional mapping from these measurements---also
called features or inputs---to the so-called class label---or output. The
fields of pattern recognition and machine learning study ways of constructing
such classifiers. The main idea behind supervised methods is that of learning
from examples: given a number of example input-output relations, to what extent
can the general mapping be learned that takes any new and unseen feature vector
to its correct class? This chapter provides a basic introduction to the
underlying ideas of how to come to a supervised classification problem. In
addition, it provides an overview of some specific classification techniques,
delves into the issues of object representation and classifier evaluation, and
(very) briefly covers some variations on the basic supervised classification
task that may also be of interest to the practitioner
- …