145 research outputs found
Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii
Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and
oracle inequalities in risk minimization'' by V. Koltchinskii [arXiv:0708.0083]Comment: Published at http://dx.doi.org/10.1214/009053606000001037 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Minimal penalty for Goldenshluger-Lepski method
This paper is concerned with adaptive nonparametric estimation using the
Goldenshluger-Lepski selection method. This estimator selection method is based
on pairwise comparisons between estimators with respect to some loss function.
The method also involves a penalty term that typically needs to be large enough
in order that the method works (in the sense that one can prove some oracle
type inequality for the selected estimator). In the case of density estimation
with kernel estimators and a quadratic loss, we show that the procedure fails
if the penalty term is chosen smaller than some critical value for the penalty:
the minimal penalty. More precisely we show that the quadratic risk of the
selected estimator explodes when the penalty is below this critical value while
it stays under control when the penalty is above this critical value. This kind
of phase transition phenomenon for penalty calibration has already been
observed and proved for penalized model selection methods in various contexts
but appears here for the first time for the Goldenshluger-Lepski pairwise
comparison method. Some simulations illustrate the theoretical results and lead
to some hints on how to use the theory to calibrate the method in practice
Estimator selection: a new method with applications to kernel density estimation
Estimator selection has become a crucial issue in non parametric estimation.
Two widely used methods are penalized empirical risk minimization (such as
penalized log-likelihood estimation) or pairwise comparison (such as Lepski's
method). Our aim in this paper is twofold. First we explain some general ideas
about the calibration issue of estimator selection methods. We review some
known results, putting the emphasis on the concept of minimal penalty which is
helpful to design data-driven selection criteria. Secondly we present a new
method for bandwidth selection within the framework of kernel density density
estimation which is in some sense intermediate between these two main methods
mentioned above. We provide some theoretical results which lead to some fully
data-driven selection strategy
Rates of convergence for robust geometric inference
Distances to compact sets are widely used in the field of Topological Data
Analysis for inferring geometric and topological features from point clouds. In
this context, the distance to a probability measure (DTM) has been introduced
by Chazal et al. (2011) as a robust alternative to the distance a compact set.
In practice, the DTM can be estimated by its empirical counterpart, that is the
distance to the empirical measure (DTEM). In this paper we give a tight control
of the deviation of the DTEM. Our analysis relies on a local analysis of
empirical processes. In particular, we show that the rates of convergence of
the DTEM directly depends on the regularity at zero of a particular quantile
fonction which contains some local information about the geometry of the
support. This quantile function is the relevant quantity to describe precisely
how difficult is a geometric inference problem. Several numerical experiments
illustrate the convergence of the DTEM and also confirm that our bounds are
tight
A new V-fold type procedure based on robust tests
We define a general V-fold cross-validation type method based on robust
tests, which is an extension of the hold-out defined by Birg{\'e} [7, Section
9]. We give some theoretical results showing that, under some weak assumptions
on the considered statistical procedures, our selected estimator satisfies an
oracle type inequality. We also introduce a fast algorithm that implements our
method. Moreover we show in our simulations that this V-fold performs generally
well for estimating a density for different sample sizes, and can handle
well-known problems, such as binwidth selection for histograms or bandwidth
selection for kernels. We finally provide a comparison with other classical
V-fold methods and study empirically the influence of the value of V on the
risk
Moment inequalities for functions of independent random variables
A general method for obtaining moment inequalities for functions of
independent random variables is presented. It is a generalization of the
entropy method which has been used to derive concentration inequalities for
such functions [Boucheron, Lugosi and Massart Ann. Probab. 31 (2003)
1583-1614], and is based on a generalized tensorization inequality due to
Latala and Oleszkiewicz [Lecture Notes in Math. 1745 (2000) 147-168]. The new
inequalities prove to be a versatile tool in a wide range of applications. We
illustrate the power of the method by showing how it can be used to
effortlessly re-derive classical inequalities including Rosenthal and
Kahane-Khinchine-type inequalities for sums of independent random variables,
moment inequalities for suprema of empirical processes and moment inequalities
for Rademacher chaos and U-statistics. Some of these corollaries are apparently
new. In particular, we generalize Talagrand's exponential inequality for
Rademacher chaos of order 2 to any order. We also discuss applications for
other complex functions of independent random variables, such as suprema of
Boolean polynomials which include, as special cases, subgraph counting problems
in random graphs.Comment: Published at http://dx.doi.org/10.1214/009117904000000856 in the
Annals of Probability (http://www.imstat.org/aop/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Numerical performance of Penalized Comparison to Overfitting for multivariate kernel density estimation
Kernel density estimation is a well known method involving a smoothing
parameter (the bandwidth) that needs to be tuned by the user. Although this
method has been widely used the bandwidth selection remains a challenging issue
in terms of balancing algorithmic performance and statistical relevance. The
purpose of this paper is to compare a recently developped bandwidth selection
method for kernel density estimation to those which are commonly used by now
(at least those which are implemented in the R-package). This new method is
called Penalized Comparison to Overfitting (PCO). It has been proposed by some
of the authors of this paper in a previous work devoted to its statistical
relevance from a purely theoretical perspective. It is compared here to other
usual bandwidth selection methods for univariate and also multivariate kernel
density estimation on the basis of intensive simulation studies. In particular,
cross-validation and plug-in criteria are numerically investigated and compared
to PCO. The take home message is that PCO can outperform the classical methods
without algorithmic additionnal cost
An l1-Oracle Inequality for the Lasso
The Lasso has attracted the attention of many authors these last years. While
many efforts have been made to prove that the Lasso behaves like a variable
selection procedure at the price of strong (though unavoidable) assumptions on
the geometric structure of these variables, much less attention has been paid
to the analysis of the performance of the Lasso as a regularization algorithm.
Our first purpose here is to provide a conceptually very simple result in this
direction. We shall prove that, provided that the regularization parameter is
properly chosen, the Lasso works almost as well as the deterministic Lasso.
This result does not require any assumption at all, neither on the structure of
the variables nor on the regression function. Our second purpose is to
introduce a new estimator particularly adapted to deal with infinite countable
dictionaries. This estimator is constructed as an l0-penalized estimator among
a sequence of Lasso estimators associated to a dyadic sequence of growing
truncated dictionaries. The selection procedure automatically chooses the best
level of truncation of the dictionary so as to make the best tradeoff between
approximation, l1-regularization and sparsity. From a theoretical point of
view, we shall provide an oracle inequality satisfied by this selected Lasso
estimator. The oracle inequalities established for the Lasso and the selected
Lasso estimators shall enable us to derive rates of convergence on a wide class
of functions, showing that these estimators perform at least as well as greedy
algorithms. Besides, we shall prove that the rates of convergence achieved by
the selected Lasso estimator are optimal in the orthonormal case by bounding
from below the minimax risk on some Besov bodies. Finally, some theoretical
results about the performance of the Lasso for infinite uncountable
dictionaries will be studied in the specific framework of neural networks. All
the oracle inequalities presented in this paper are obtained via the
application of a single general theorem of model selection among a collection
of nonlinear models which is a direct consequence of the Gaussian concentration
inequality. The key idea that enables us to apply this general theorem is to
see l1-regularization as a model selection procedure among l1-balls
- …