163 research outputs found
Robust Learning from Bites
Many robust statistical procedures have two drawbacks. Firstly, they are computer-intensive such that they can hardly be used for massive data sets. Secondly, robust confidence intervals for the estimated parameters or robust predictions according to the fitted models are often unknown. Here, we propose a general method to overcome these problems of robust estimation in the context of huge data sets. The method is scalable to the memory of the computer, can be distributed on several processors if available, and can help to reduce the computation time substantially. The method additionally offers distribution-free confidence intervals for the median of the predictions. The method is illustrated for two situations: robust estimation in linear regression and kernel logistic regression from statistical machine learning. --
On a strategy to develop robust and simple tariffs from motor vehicle insurance data
The goals of this paper are twofold: we describe common features in data sets from motor vehicle insurance companies and we investigate a general strategy which exploits the knowledge of such features. The results of the strategy are a basis to develop insurance tariffs. The strategy is applied to a data set from motor vehicle insurance companies. We use a nonparametric approach based on a combination of kernel logistic regression and ¡support vector regression. --Classification,Data Mining,Insurance tariffs,Kernel logistic regression,Machine learning,Regression,Robustness,Simplicity,Support Vector Machine,Support Vector Regression
Regression depth and support vector machine
The regression depth method (RDM) proposed by Rousseeuw and Hubert [RH99] plays an important role in the area of robust regression for a continuous response variable. Christmann and Rousseeuw [CR01] showed that RDM is also useful for the case of binary regression. Vapnik?s convex risk minimization principle [Vap98] has a dominating role in statistical machine learning theory. Important special cases are the support vector machine (SVM), [epsilon]-support vector regression and kernel logistic regression. In this paper connections between these methods from different disciplines are investigated for the case of pattern recognition. Some results concerning the robustness of the SVM and other kernel based methods are given. --
Qualitative Robustness of Support Vector Machines
Support vector machines have attracted much attention in theoretical and in
applied statistics. Main topics of recent interest are consistency, learning
rates and robustness. In this article, it is shown that support vector machines
are qualitatively robust. Since support vector machines can be represented by a
functional on the set of all probability measures, qualitative robustness is
proven by showing that this functional is continuous with respect to the
topology generated by weak convergence of probability measures. Combined with
the existence and uniqueness of support vector machines, our results show that
support vector machines are the solutions of a well-posed mathematical problem
in Hadamard's sense
Estimating conditional quantiles with the help of the pinball loss
The so-called pinball loss for estimating conditional quantiles is a
well-known tool in both statistics and machine learning. So far, however, only
little work has been done to quantify the efficiency of this tool for
nonparametric approaches. We fill this gap by establishing inequalities that
describe how close approximate pinball risk minimizers are to the corresponding
conditional quantile. These inequalities, which hold under mild assumptions on
the data-generating distribution, are then used to establish so-called variance
bounds, which recently turned out to play an important role in the statistical
analysis of (regularized) empirical risk minimization approaches. Finally, we
use both types of inequalities to establish an oracle inequality for support
vector machines that use the pinball loss. The resulting learning rates are
min--max optimal under some standard regularity assumptions on the conditional
quantile.Comment: Published in at http://dx.doi.org/10.3150/10-BEJ267 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
On robustness properties of convex risk minimization methods for pattern recognition
The paper brings together methods from two disciplines: machine learning theory and robust statistics. Robustness properties of machine learning methods based on convex risk minimization are investigated for the problem of pattern recognition. Assumptions are given for the existence of the influence function of the classifiers and for bounds of the influence function. Kernel logistic regression, support vector machines, least squares and the AdaBoost loss function are treated as special cases. A sensitivity analysis of the support vector machine is given. --AdaBoost loss function,influence function,kernel logistic regression,robustness,sensitivity curve,statistical learning,support vector machine,total variation
Consistency and robustness of kernel based regression
We investigate properties of kernel based regression (KBR) methods which are inspired by the convex risk minimization method of support vector machines. We first describe the relation between the used loss function of the KBR method and the tail of the response variable Y . We then establish a consistency result for KBR and give assumptions for the existence of the influence function. In particular, our results allow to choose the loss function and the kernel to obtain computational tractable and consistent KBR methods having bounded influence functions. Furthermore, bounds for the sensitivity curve which is a finite sample version of the influence function are developed, and some numerical experiments are discussed. --
- …