145 research outputs found

    Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii

    Full text link
    Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii [arXiv:0708.0083]Comment: Published at http://dx.doi.org/10.1214/009053606000001037 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Minimal penalty for Goldenshluger-Lepski method

    Get PDF
    This paper is concerned with adaptive nonparametric estimation using the Goldenshluger-Lepski selection method. This estimator selection method is based on pairwise comparisons between estimators with respect to some loss function. The method also involves a penalty term that typically needs to be large enough in order that the method works (in the sense that one can prove some oracle type inequality for the selected estimator). In the case of density estimation with kernel estimators and a quadratic loss, we show that the procedure fails if the penalty term is chosen smaller than some critical value for the penalty: the minimal penalty. More precisely we show that the quadratic risk of the selected estimator explodes when the penalty is below this critical value while it stays under control when the penalty is above this critical value. This kind of phase transition phenomenon for penalty calibration has already been observed and proved for penalized model selection methods in various contexts but appears here for the first time for the Goldenshluger-Lepski pairwise comparison method. Some simulations illustrate the theoretical results and lead to some hints on how to use the theory to calibrate the method in practice

    Estimator selection: a new method with applications to kernel density estimation

    Get PDF
    Estimator selection has become a crucial issue in non parametric estimation. Two widely used methods are penalized empirical risk minimization (such as penalized log-likelihood estimation) or pairwise comparison (such as Lepski's method). Our aim in this paper is twofold. First we explain some general ideas about the calibration issue of estimator selection methods. We review some known results, putting the emphasis on the concept of minimal penalty which is helpful to design data-driven selection criteria. Secondly we present a new method for bandwidth selection within the framework of kernel density density estimation which is in some sense intermediate between these two main methods mentioned above. We provide some theoretical results which lead to some fully data-driven selection strategy

    Rates of convergence for robust geometric inference

    Get PDF
    Distances to compact sets are widely used in the field of Topological Data Analysis for inferring geometric and topological features from point clouds. In this context, the distance to a probability measure (DTM) has been introduced by Chazal et al. (2011) as a robust alternative to the distance a compact set. In practice, the DTM can be estimated by its empirical counterpart, that is the distance to the empirical measure (DTEM). In this paper we give a tight control of the deviation of the DTEM. Our analysis relies on a local analysis of empirical processes. In particular, we show that the rates of convergence of the DTEM directly depends on the regularity at zero of a particular quantile fonction which contains some local information about the geometry of the support. This quantile function is the relevant quantity to describe precisely how difficult is a geometric inference problem. Several numerical experiments illustrate the convergence of the DTEM and also confirm that our bounds are tight

    A new V-fold type procedure based on robust tests

    Get PDF
    We define a general V-fold cross-validation type method based on robust tests, which is an extension of the hold-out defined by Birg{\'e} [7, Section 9]. We give some theoretical results showing that, under some weak assumptions on the considered statistical procedures, our selected estimator satisfies an oracle type inequality. We also introduce a fast algorithm that implements our method. Moreover we show in our simulations that this V-fold performs generally well for estimating a density for different sample sizes, and can handle well-known problems, such as binwidth selection for histograms or bandwidth selection for kernels. We finally provide a comparison with other classical V-fold methods and study empirically the influence of the value of V on the risk

    Moment inequalities for functions of independent random variables

    Full text link
    A general method for obtaining moment inequalities for functions of independent random variables is presented. It is a generalization of the entropy method which has been used to derive concentration inequalities for such functions [Boucheron, Lugosi and Massart Ann. Probab. 31 (2003) 1583-1614], and is based on a generalized tensorization inequality due to Latala and Oleszkiewicz [Lecture Notes in Math. 1745 (2000) 147-168]. The new inequalities prove to be a versatile tool in a wide range of applications. We illustrate the power of the method by showing how it can be used to effortlessly re-derive classical inequalities including Rosenthal and Kahane-Khinchine-type inequalities for sums of independent random variables, moment inequalities for suprema of empirical processes and moment inequalities for Rademacher chaos and U-statistics. Some of these corollaries are apparently new. In particular, we generalize Talagrand's exponential inequality for Rademacher chaos of order 2 to any order. We also discuss applications for other complex functions of independent random variables, such as suprema of Boolean polynomials which include, as special cases, subgraph counting problems in random graphs.Comment: Published at http://dx.doi.org/10.1214/009117904000000856 in the Annals of Probability (http://www.imstat.org/aop/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Numerical performance of Penalized Comparison to Overfitting for multivariate kernel density estimation

    Full text link
    Kernel density estimation is a well known method involving a smoothing parameter (the bandwidth) that needs to be tuned by the user. Although this method has been widely used the bandwidth selection remains a challenging issue in terms of balancing algorithmic performance and statistical relevance. The purpose of this paper is to compare a recently developped bandwidth selection method for kernel density estimation to those which are commonly used by now (at least those which are implemented in the R-package). This new method is called Penalized Comparison to Overfitting (PCO). It has been proposed by some of the authors of this paper in a previous work devoted to its statistical relevance from a purely theoretical perspective. It is compared here to other usual bandwidth selection methods for univariate and also multivariate kernel density estimation on the basis of intensive simulation studies. In particular, cross-validation and plug-in criteria are numerically investigated and compared to PCO. The take home message is that PCO can outperform the classical methods without algorithmic additionnal cost

    An l1-Oracle Inequality for the Lasso

    Get PDF
    The Lasso has attracted the attention of many authors these last years. While many efforts have been made to prove that the Lasso behaves like a variable selection procedure at the price of strong (though unavoidable) assumptions on the geometric structure of these variables, much less attention has been paid to the analysis of the performance of the Lasso as a regularization algorithm. Our first purpose here is to provide a conceptually very simple result in this direction. We shall prove that, provided that the regularization parameter is properly chosen, the Lasso works almost as well as the deterministic Lasso. This result does not require any assumption at all, neither on the structure of the variables nor on the regression function. Our second purpose is to introduce a new estimator particularly adapted to deal with infinite countable dictionaries. This estimator is constructed as an l0-penalized estimator among a sequence of Lasso estimators associated to a dyadic sequence of growing truncated dictionaries. The selection procedure automatically chooses the best level of truncation of the dictionary so as to make the best tradeoff between approximation, l1-regularization and sparsity. From a theoretical point of view, we shall provide an oracle inequality satisfied by this selected Lasso estimator. The oracle inequalities established for the Lasso and the selected Lasso estimators shall enable us to derive rates of convergence on a wide class of functions, showing that these estimators perform at least as well as greedy algorithms. Besides, we shall prove that the rates of convergence achieved by the selected Lasso estimator are optimal in the orthonormal case by bounding from below the minimax risk on some Besov bodies. Finally, some theoretical results about the performance of the Lasso for infinite uncountable dictionaries will be studied in the specific framework of neural networks. All the oracle inequalities presented in this paper are obtained via the application of a single general theorem of model selection among a collection of nonlinear models which is a direct consequence of the Gaussian concentration inequality. The key idea that enables us to apply this general theorem is to see l1-regularization as a model selection procedure among l1-balls
    • …