18,445 research outputs found
Noise Tolerance under Risk Minimization
In this paper we explore noise tolerant learning of classifiers. We formulate
the problem as follows. We assume that there is an
training set which is noise-free. The actual training set given to the learning
algorithm is obtained from this ideal data set by corrupting the class label of
each example. The probability that the class label of an example is corrupted
is a function of the feature vector of the example. This would account for most
kinds of noisy data one encounters in practice. We say that a learning method
is noise tolerant if the classifiers learnt with the ideal noise-free data and
with noisy data, both have the same classification accuracy on the noise-free
data. In this paper we analyze the noise tolerance properties of risk
minimization (under different loss functions), which is a generic method for
learning classifiers. We show that risk minimization under 0-1 loss function
has impressive noise tolerance properties and that under squared error loss is
tolerant only to uniform noise; risk minimization under other loss functions is
not noise tolerant. We conclude the paper with some discussion on implications
of these theoretical results
Making Risk Minimization Tolerant to Label Noise
In many applications, the training data, from which one needs to learn a
classifier, is corrupted with label noise. Many standard algorithms such as SVM
perform poorly in presence of label noise. In this paper we investigate the
robustness of risk minimization to label noise. We prove a sufficient condition
on a loss function for the risk minimization under that loss to be tolerant to
uniform label noise. We show that the loss, sigmoid loss, ramp loss and
probit loss satisfy this condition though none of the standard convex loss
functions satisfy it. We also prove that, by choosing a sufficiently large
value of a parameter in the loss function, the sigmoid loss, ramp loss and
probit loss can be made tolerant to non-uniform label noise also if we can
assume the classes to be separable under noise-free data distribution. Through
extensive empirical studies, we show that risk minimization under the
loss, the sigmoid loss and the ramp loss has much better robustness to label
noise when compared to the SVM algorithm
Robust Loss Functions under Label Noise for Deep Neural Networks
In many applications of classifier learning, training data suffers from label
noise. Deep networks are learned using huge training data where the problem of
noisy labels is particularly relevant. The current techniques proposed for
learning deep networks under label noise focus on modifying the network
architecture and on algorithms for estimating true labels from noisy labels. An
alternate approach would be to look for loss functions that are inherently
noise-tolerant. For binary classification there exist theoretical results on
loss functions that are robust to label noise. In this paper, we provide some
sufficient conditions on a loss function so that risk minimization under that
loss function would be inherently tolerant to label noise for multiclass
classification problems. These results generalize the existing results on
noise-tolerant loss functions for binary classification. We study some of the
widely used loss functions in deep networks and show that the loss function
based on mean absolute value of error is inherently robust to label noise. Thus
standard back propagation is enough to learn the true classifier even under
label noise. Through experiments, we illustrate the robustness of risk
minimization with such loss functions for learning neural networks.Comment: Appeared in AAAI 201
An Efficient Approach for Computing Optimal Low-Rank Regularized Inverse Matrices
Standard regularization methods that are used to compute solutions to
ill-posed inverse problems require knowledge of the forward model. In many
real-life applications, the forward model is not known, but training data is
readily available. In this paper, we develop a new framework that uses training
data, as a substitute for knowledge of the forward model, to compute an optimal
low-rank regularized inverse matrix directly, allowing for very fast
computation of a regularized solution. We consider a statistical framework
based on Bayes and empirical Bayes risk minimization to analyze theoretical
properties of the problem. We propose an efficient rank update approach for
computing an optimal low-rank regularized inverse matrix for various error
measures. Numerical experiments demonstrate the benefits and potential
applications of our approach to problems in signal and image processing.Comment: 24 pages, 11 figure
Analysis-of-marginal-Tail-Means (ATM): a robust method for discrete black-box optimization
We present a new method, called Analysis-of-marginal-Tail-Means (ATM), for
effective robust optimization of discrete black-box problems. ATM has important
applications to many real-world engineering problems (e.g., manufacturing
optimization, product design, molecular engineering), where the objective to
optimize is black-box and expensive, and the design space is inherently
discrete. One weakness of existing methods is that they are not robust: these
methods perform well under certain assumptions, but yield poor results when
such assumptions (which are difficult to verify in black-box problems) are
violated. ATM addresses this via the use of marginal tail means for
optimization, which combines both rank-based and model-based methods. The
trade-off between rank- and model-based optimization is tuned by first
identifying important main effects and interactions, then finding a good
compromise which best exploits additive structure. By adaptively tuning this
trade-off from data, ATM provides improved robust optimization over existing
methods, particularly in problems with (i) a large number of factors, (ii)
unordered factors, or (iii) experimental noise. We demonstrate the
effectiveness of ATM in simulations and in two real-world engineering problems:
the first on robust parameter design of a circular piston, and the second on
product family design of a thermistor network
- …