1,932 research outputs found
Robustness and Regularization of Support Vector Machines
We consider regularized support vector machines (SVMs) and show that they are
precisely equivalent to a new robust optimization formulation. We show that
this equivalence of robust optimization and regularization has implications for
both algorithms, and analysis. In terms of algorithms, the equivalence suggests
more general SVM-like algorithms for classification that explicitly build in
protection to noise, and at the same time control overfitting. On the analysis
front, the equivalence of robustness and regularization, provides a robust
optimization interpretation for the success of regularized SVMs. We use the
this new robustness interpretation of SVMs to give a new proof of consistency
of (kernelized) SVMs, thus establishing robustness as the reason regularized
SVMs generalize well
Concentration inequalities of the cross-validation estimate for stable predictors
In this article, we derive concentration inequalities for the
cross-validation estimate of the generalization error for stable predictors in
the context of risk assessment. The notion of stability has been first
introduced by \cite{DEWA79} and extended by \cite{KEA95}, \cite{BE01} and
\cite{KUNIY02} to characterize class of predictors with infinite VC dimension.
In particular, this covers -nearest neighbors rules, bayesian algorithm
(\cite{KEA95}), boosting,... General loss functions and class of predictors are
considered. We use the formalism introduced by \cite{DUD03} to cover a large
variety of cross-validation procedures including leave-one-out
cross-validation, -fold cross-validation, hold-out cross-validation (or
split sample), and the leave--out cross-validation.
In particular, we give a simple rule on how to choose the cross-validation,
depending on the stability of the class of predictors. In the special case of
uniform stability, an interesting consequence is that the number of elements in
the test set is not required to grow to infinity for the consistency of the
cross-validation procedure. In this special case, the particular interest of
leave-one-out cross-validation is emphasized
Regularizing Portfolio Optimization
The optimization of large portfolios displays an inherent instability to
estimation error. This poses a fundamental problem, because solutions that are
not stable under sample fluctuations may look optimal for a given sample, but
are, in effect, very far from optimal with respect to the average risk. In this
paper, we approach the problem from the point of view of statistical learning
theory. The occurrence of the instability is intimately related to over-fitting
which can be avoided using known regularization methods. We show how
regularized portfolio optimization with the expected shortfall as a risk
measure is related to support vector regression. The budget constraint dictates
a modification. We present the resulting optimization problem and discuss the
solution. The L2 norm of the weight vector is used as a regularizer, which
corresponds to a diversification "pressure". This means that diversification,
besides counteracting downward fluctuations in some assets by upward
fluctuations in others, is also crucial because it improves the stability of
the solution. The approach we provide here allows for the simultaneous
treatment of optimization and diversification in one framework that enables the
investor to trade-off between the two, depending on the size of the available
data set
Challenges in the Analysis of Mass-Throughput Data: A Technical Commentary from the Statistical Machine Learning Perspective
Sound data analysis is critical to the success of modern molecular medicine research that involves collection and interpretation of mass-throughput data. The novel nature and high-dimensionality in such datasets pose a series of nontrivial data analysis problems. This technical commentary discusses the problems of over-fitting, error estimation, curse of dimensionality, causal versus predictive modeling, integration of heterogeneous types of data, and lack of standard protocols for data analysis. We attempt to shed light on the nature and causes of these problems and to outline viable methodological approaches to overcome them
- …