1,731 research outputs found
Making Risk Minimization Tolerant to Label Noise
In many applications, the training data, from which one needs to learn a
classifier, is corrupted with label noise. Many standard algorithms such as SVM
perform poorly in presence of label noise. In this paper we investigate the
robustness of risk minimization to label noise. We prove a sufficient condition
on a loss function for the risk minimization under that loss to be tolerant to
uniform label noise. We show that the loss, sigmoid loss, ramp loss and
probit loss satisfy this condition though none of the standard convex loss
functions satisfy it. We also prove that, by choosing a sufficiently large
value of a parameter in the loss function, the sigmoid loss, ramp loss and
probit loss can be made tolerant to non-uniform label noise also if we can
assume the classes to be separable under noise-free data distribution. Through
extensive empirical studies, we show that risk minimization under the
loss, the sigmoid loss and the ramp loss has much better robustness to label
noise when compared to the SVM algorithm
The Support Vector Machine and Mixed Integer Linear Programming: Ramp Loss SVM with L1-Norm Regularization
The support vector machine (SVM) is a flexible classification method that accommodates a kernel trick to learn nonlinear decision rules. The traditional formulation as an optimization problem is a quadratic program. In efforts to reduce computational complexity, some have proposed using an L1-norm regularization to create a linear program (LP). In other efforts aimed at increasing the robustness to outliers, investigators have proposed using the ramp loss which results in what may be expressed as a quadratic integer programming problem (QIP). In this paper, we consider combining these ideas for ramp loss SVM with L1-norm regularization. The result is four formulations for SVM that each may be expressed as a mixed integer linear program (MILP). We observe that ramp loss SVM with L1-norm regularization provides robustness to outliers with the linear kernel. We investigate the time required to find good solutions to the various formulations using a branch and bound solver
- β¦