2,709 research outputs found
Learning with Symmetric Label Noise: The Importance of Being Unhinged
Convex potential minimisation is the de facto approach to binary
classification. However, Long and Servedio [2010] proved that under symmetric
label noise (SLN), minimisation of any convex potential over a linear function
class can result in classification performance equivalent to random guessing.
This ostensibly shows that convex losses are not SLN-robust. In this paper, we
propose a convex, classification-calibrated loss and prove that it is
SLN-robust. The loss avoids the Long and Servedio [2010] result by virtue of
being negatively unbounded. The loss is a modification of the hinge loss, where
one does not clamp at zero; hence, we call it the unhinged loss. We show that
the optimal unhinged solution is equivalent to that of a strongly regularised
SVM, and is the limiting solution for any convex potential; this implies that
strong l2 regularisation makes most standard learners SLN-robust. Experiments
confirm the SLN-robustness of the unhinged loss
Unexpected properties of bandwidth choice when smoothing discrete data for constructing a functional data classifier
The data functions that are studied in the course of functional data analysis
are assembled from discrete data, and the level of smoothing that is used is
generally that which is appropriate for accurate approximation of the
conceptually smooth functions that were not actually observed. Existing
literature shows that this approach is effective, and even optimal, when using
functional data methods for prediction or hypothesis testing. However, in the
present paper we show that this approach is not effective in classification
problems. There a useful rule of thumb is that undersmoothing is often
desirable, but there are several surprising qualifications to that approach.
First, the effect of smoothing the training data can be more significant than
that of smoothing the new data set to be classified; second, undersmoothing is
not always the right approach, and in fact in some cases using a relatively
large bandwidth can be more effective; and third, these perverse results are
the consequence of very unusual properties of error rates, expressed as
functions of smoothing parameters. For example, the orders of magnitude of
optimal smoothing parameter choices depend on the signs and sizes of terms in
an expansion of error rate, and those signs and sizes can vary dramatically
from one setting to another, even for the same classifier.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1158 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …