5,352 research outputs found
Noise Tolerance under Risk Minimization
In this paper we explore noise tolerant learning of classifiers. We formulate
the problem as follows. We assume that there is an
training set which is noise-free. The actual training set given to the learning
algorithm is obtained from this ideal data set by corrupting the class label of
each example. The probability that the class label of an example is corrupted
is a function of the feature vector of the example. This would account for most
kinds of noisy data one encounters in practice. We say that a learning method
is noise tolerant if the classifiers learnt with the ideal noise-free data and
with noisy data, both have the same classification accuracy on the noise-free
data. In this paper we analyze the noise tolerance properties of risk
minimization (under different loss functions), which is a generic method for
learning classifiers. We show that risk minimization under 0-1 loss function
has impressive noise tolerance properties and that under squared error loss is
tolerant only to uniform noise; risk minimization under other loss functions is
not noise tolerant. We conclude the paper with some discussion on implications
of these theoretical results
Rates of convergence in active learning
We study the rates of convergence in generalization error achievable by
active learning under various types of label noise. Additionally, we study the
general problem of model selection for active learning with a nested hierarchy
of hypothesis classes and propose an algorithm whose error rate provably
converges to the best achievable error among classifiers in the hierarchy at a
rate adaptive to both the complexity of the optimal classifier and the noise
conditions. In particular, we state sufficient conditions for these rates to be
dramatically faster than those achievable by passive learning.Comment: Published in at http://dx.doi.org/10.1214/10-AOS843 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Optimal rates of aggregation in classification under low noise assumption
In the same spirit as Tsybakov (2003), we define the optimality of an
aggregation procedure in the problem of classification. Using an aggregate with
exponential weights, we obtain an optimal rate of convex aggregation for the
hinge risk under the margin assumption. Moreover we obtain an optimal rate of
model selection aggregation under the margin assumption for the excess Bayes
risk
Simultaneous adaptation to the margin and to complexity in classification
We consider the problem of adaptation to the margin and to complexity in
binary classification. We suggest an exponential weighting aggregation scheme.
We use this aggregation procedure to construct classifiers which adapt
automatically to margin and complexity. Two main examples are worked out in
which adaptivity is achieved in frameworks proposed by Steinwart and Scovel
[Learning Theory. Lecture Notes in Comput. Sci. 3559 (2005) 279--294. Springer,
Berlin; Ann. Statist. 35 (2007) 575--607] and Tsybakov [Ann. Statist. 32 (2004)
135--166]. Adaptive schemes, like ERM or penalized ERM, usually involve a
minimization step. This is not the case for our procedure.Comment: Published in at http://dx.doi.org/10.1214/009053607000000055 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Lasso type classifiers with a reject option
We consider the problem of binary classification where one can, for a
particular cost, choose not to classify an observation. We present a simple
proof for the oracle inequality for the excess risk of structural risk
minimizers using a lasso type penalty.Comment: Published at http://dx.doi.org/10.1214/07-EJS058 in the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Robust Classification for Imprecise Environments
In real-world environments it usually is difficult to specify target
operating conditions precisely, for example, target misclassification costs.
This uncertainty makes building robust classification systems problematic. We
show that it is possible to build a hybrid classifier that will perform at
least as well as the best available classifier for any target conditions. In
some cases, the performance of the hybrid actually can surpass that of the best
known classifier. This robust performance extends across a wide variety of
comparison frameworks, including the optimization of metrics such as accuracy,
expected cost, lift, precision, recall, and workforce utilization. The hybrid
also is efficient to build, to store, and to update. The hybrid is based on a
method for the comparison of classifier performance that is robust to imprecise
class distributions and misclassification costs. The ROC convex hull (ROCCH)
method combines techniques from ROC analysis, decision analysis and
computational geometry, and adapts them to the particulars of analyzing learned
classifiers. The method is efficient and incremental, minimizes the management
of classifier performance data, and allows for clear visual comparisons and
sensitivity analyses. Finally, we point to empirical evidence that a robust
hybrid classifier indeed is needed for many real-world problems.Comment: 24 pages, 12 figures. To be published in Machine Learning Journal.
For related papers, see http://www.hpl.hp.com/personal/Tom_Fawcett/ROCCH
PAC-Bayesian Majority Vote for Late Classifier Fusion
A lot of attention has been devoted to multimedia indexing over the past few
years. In the literature, we often consider two kinds of fusion schemes: The
early fusion and the late fusion. In this paper we focus on late classifier
fusion, where one combines the scores of each modality at the decision level.
To tackle this problem, we investigate a recent and elegant well-founded
quadratic program named MinCq coming from the Machine Learning PAC-Bayes
theory. MinCq looks for the weighted combination, over a set of real-valued
functions seen as voters, leading to the lowest misclassification rate, while
making use of the voters' diversity. We provide evidence that this method is
naturally adapted to late fusion procedure. We propose an extension of MinCq by
adding an order- preserving pairwise loss for ranking, helping to improve Mean
Averaged Precision measure. We confirm the good behavior of the MinCq-based
fusion approaches with experiments on a real image benchmark.Comment: 7 pages, Research repor
- …