47,211 research outputs found
Learning with Symmetric Label Noise: The Importance of Being Unhinged
Convex potential minimisation is the de facto approach to binary
classification. However, Long and Servedio [2010] proved that under symmetric
label noise (SLN), minimisation of any convex potential over a linear function
class can result in classification performance equivalent to random guessing.
This ostensibly shows that convex losses are not SLN-robust. In this paper, we
propose a convex, classification-calibrated loss and prove that it is
SLN-robust. The loss avoids the Long and Servedio [2010] result by virtue of
being negatively unbounded. The loss is a modification of the hinge loss, where
one does not clamp at zero; hence, we call it the unhinged loss. We show that
the optimal unhinged solution is equivalent to that of a strongly regularised
SVM, and is the limiting solution for any convex potential; this implies that
strong l2 regularisation makes most standard learners SLN-robust. Experiments
confirm the SLN-robustness of the unhinged loss
Cross-validation in nonparametric regression with outliers
A popular data-driven method for choosing the bandwidth in standard kernel
regression is cross-validation. Even when there are outliers in the data,
robust kernel regression can be used to estimate the unknown regression curve
[Robust and Nonlinear Time Series Analysis. Lecture Notes in Statist. (1984) 26
163--184]. However, under these circumstances standard cross-validation is no
longer a satisfactory bandwidth selector because it is unduly influenced by
extreme prediction errors caused by the existence of these outliers. A more
robust method proposed here is a cross-validation method that discounts the
extreme prediction errors. In large samples the robust method chooses
consistent bandwidths, and the consistency of the method is practically
independent of the form in which extreme prediction errors are discounted.
Additionally, evaluation of the method's finite sample behavior in a simulation
demonstrates that the proposed method performs favorably. This method can also
be applied to other problems, for example, model selection, that require
cross-validation.Comment: Published at http://dx.doi.org/10.1214/009053605000000499 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …