We investigate the optimality for model selection of the so-called slope
heuristics, $V$-fold cross-validation and $V$-fold penalization in a
heteroscedastic with random design regression context. We consider a new class
of linear models that we call strongly localized bases and that generalize
histograms, piecewise polynomials and compactly supported wavelets. We derive
sharp oracle inequalities that prove the asymptotic optimality of the slope
heuristics---when the optimal penalty shape is known---and $V$ -fold
penalization. Furthermore, $V$-fold cross-validation seems to be suboptimal for
a fixed value of $V$ since it recovers asymptotically the oracle learned from a
sample size equal to $1-V^{-1}$ of the original amount of data. Our results are
based on genuine concentration inequalities for the true and empirical excess
risks that are of independent interest. We show in our experiments the good
behavior of the slope heuristics for the selection of linear wavelet models.
Furthermore, $V$-fold cross-validation and $V$-fold penalization have
comparable efficiency

Navarro, Fabien

Saumard, Adrien

English

arXiv

International audienceWe investigate the optimality for model selection of the so-called slope heuristics, $V$-fold cross-validation and $V$-fold penalization in a heteroscedastic with random design regression context. We consider a new class of linear models that we call strongly localized bases and that generalize histograms, piecewise polynomials and compactly supported wavelets. We derive sharp oracle inequalities that prove the asymptotic optimality of the slope heuristics---when the optimal penalty shape is known---and $V$-fold penalization. Furthermore, $V$-fold cross-validation seems to be suboptimal for a fixed value of $V$ since it recovers asymptotically the oracle learned from a sample size equal to $1-V^{-1}$ of the original amount of data. Our results are based on genuine concentration inequalities for the true and empirical excess risks that are of independent interest. We show in our experiments the good behavior of the slope heuristics for the selection of linear wavelet models. Furthermore, $V$-fold cross-validation and $V$-fold penalization have comparable efficiency

Archive Ouverte en Sciences de l'Information et de la Communication

Slope heuristics and V-Fold model selection in heteroscedastic regression using strongly localized bases

arXiv.org e-Print Archive

Slope heuristics and V-Fold model selection in heteroscedastic
  regression using strongly localized bases

Numérisation de Documents Anciens Mathématiques

We investigate the optimality for model selection of the so-called slope heuristics, V-fold cross-validation and V-fold penalization in a heteroscedatic with random design regression context. We consider a new class of linear models that we call strongly localized bases and that generalize histograms, piecewise polynomials and compactly supported wavelets. We derive sharp oracle inequalities that prove the asymptotic optimality of the slope heuristics – when the optimal penalty shape is known – and V-fold penalization. Furthermore, V-fold cross-validation seems to be suboptimal for a fixed value of V since it recovers asymptotically the oracle learned from a sample size equal to 1 − V-1 of the original amount of data. Our results are based on genuine concentration inequalities for the true and empirical excess risks that are of independent interest. We show in our experiments the good behavior of the slope heuristics for the selection of linear wavelet models. Furthermore, V-fold cross-validation and V-fold penalization have comparable efficiency

Fabien Navarro

Adrien Saumard

EDP Sciences OAI-PMH repository (1.2.0)

Slope heuristics and V-Fold model selection   in heteroscedastic regression using strongly localized bases

https://hal.archives-ouvertes.fr/hal-00528539

Slope heuristics and V-Fold model selection in heteroscedastic regression using strongly localized bases

Abstract

Similar works

Full text

Available Versions

Archive Ouverte en Sciences de l'Information et de la Communication

arXiv.org e-Print Archive

Numérisation de Documents Anciens Mathématiques

EDP Sciences OAI-PMH repository (1.2.0)