Search CORE

1,932 research outputs found

Robustness and Regularization of Support Vector Machines

Author: Caramanis Constantine
Mannor Shie
Xu Huan
Publication venue
Publication date: 11/11/2008
Field of study

We consider regularized support vector machines (SVMs) and show that they are precisely equivalent to a new robust optimization formulation. We show that this equivalence of robust optimization and regularization has implications for both algorithms, and analysis. In terms of algorithms, the equivalence suggests more general SVM-like algorithms for classification that explicitly build in protection to noise, and at the same time control overfitting. On the analysis front, the equivalence of robustness and regularization, provides a robust optimization interpretation for the success of regularized SVMs. We use the this new robustness interpretation of SVMs to give a new proof of consistency of (kernelized) SVMs, thus establishing robustness as the reason regularized SVMs generalize well

arXiv.org e-Print Archive

CiteSeerX

Concentration inequalities of the cross-validation estimate for stable predictors

Author: Cornec Matthieu
Publication venue
Publication date: 01/01/2010
Field of study

In this article, we derive concentration inequalities for the cross-validation estimate of the generalization error for stable predictors in the context of risk assessment. The notion of stability has been first introduced by \cite{DEWA79} and extended by \cite{KEA95}, \cite{BE01} and \cite{KUNIY02} to characterize class of predictors with infinite VC dimension. In particular, this covers

k

-nearest neighbors rules, bayesian algorithm (\cite{KEA95}), boosting,... General loss functions and class of predictors are considered. We use the formalism introduced by \cite{DUD03} to cover a large variety of cross-validation procedures including leave-one-out cross-validation,

k

-fold cross-validation, hold-out cross-validation (or split sample), and the leave-

\upsilon

-out cross-validation. In particular, we give a simple rule on how to choose the cross-validation, depending on the stability of the class of predictors. In the special case of uniform stability, an interesting consequence is that the number of elements in the test set is not required to grow to infinity for the consistency of the cross-validation procedure. In this special case, the particular interest of leave-one-out cross-validation is emphasized

arXiv.org e-Print Archive

CiteSeerX

Regularizing Portfolio Optimization

Author: Acerbi C
Acerbi C Nordio C Sirtori C
Bengio Y
Bertsekas D P
Bordes A
Bottou L
Bouchaud J-Ph
Burda Z
Chopra V K
DeMiguel V
Elton E J
Embrechts P
Frahm G
Frahm G Memmel Ch
Gulyas N Kondor I
Imre Kondor
Jobson J D
Jorion P
Kempf A
Kondor I Varga-Haszonits I
Macrae R
Markowitz H
Morgan J P Reuters Riskmetrics
Perez-Cruz F
Potters M
Rockafellar R T
Schölkopf B
Schölkopf B
Susanne Still
Tibshirani R
Vanderbei R J
Vapnik V
Vapnik V
Vapnik V
Varga-Haszonits I
Publication venue: 'IOP Publishing'
Publication date: 09/11/2009
Field of study

The optimization of large portfolios displays an inherent instability to estimation error. This poses a fundamental problem, because solutions that are not stable under sample fluctuations may look optimal for a given sample, but are, in effect, very far from optimal with respect to the average risk. In this paper, we approach the problem from the point of view of statistical learning theory. The occurrence of the instability is intimately related to over-fitting which can be avoided using known regularization methods. We show how regularized portfolio optimization with the expected shortfall as a risk measure is related to support vector regression. The budget constraint dictates a modification. We present the resulting optimization problem and discuss the solution. The L2 norm of the weight vector is used as a regularizer, which corresponds to a diversification "pressure". This means that diversification, besides counteracting downward fluctuations in some assets by upward fluctuations in others, is also crucial because it improves the stability of the solution. The approach we provide here allows for the simultaneous treatment of optimization and diversification in one framework that enables the investor to trade-off between the two, depending on the size of the available data set

arXiv.org e-Print Archive

Crossref

ELTE Digital Institutional Repository (EDIT)

Challenges in the Analysis of Mass-Throughput Data: A Technical Commentary from the Statistical Machine Learning Perspective

Author: Aliferis Constantin F.
Statnikov Alexander
Tsamardinos Ioannis
Publication venue: Libertas Academica
Publication date: 01/01/2006
Field of study

Sound data analysis is critical to the success of modern molecular medicine research that involves collection and interpretation of mass-throughput data. The novel nature and high-dimensionality in such datasets pose a series of nontrivial data analysis problems. This technical commentary discusses the problems of over-fitting, error estimation, curse of dimensionality, causal versus predictive modeling, integration of heterogeneous types of data, and lack of standard protocols for data analysis. We attempt to shed light on the nature and causes of these problems and to outline viable methodological approaches to overcome them

Directory of Open Access Journals

PubMed Central