477 research outputs found
Estimator selection: a new method with applications to kernel density estimation
Estimator selection has become a crucial issue in non parametric estimation.
Two widely used methods are penalized empirical risk minimization (such as
penalized log-likelihood estimation) or pairwise comparison (such as Lepski's
method). Our aim in this paper is twofold. First we explain some general ideas
about the calibration issue of estimator selection methods. We review some
known results, putting the emphasis on the concept of minimal penalty which is
helpful to design data-driven selection criteria. Secondly we present a new
method for bandwidth selection within the framework of kernel density density
estimation which is in some sense intermediate between these two main methods
mentioned above. We provide some theoretical results which lead to some fully
data-driven selection strategy
Posterior concentration rates for empirical Bayes procedures, with applications to Dirichlet Process mixtures
In this paper we provide general conditions to check on the model and the
prior to derive posterior concentration rates for data-dependent priors (or
empirical Bayes approaches). We aim at providing conditions that are close to
the conditions provided in the seminal paper by Ghosal and van der Vaart
(2007a). We then apply the general theorem to two different settings: the
estimation of a density using Dirichlet process mixtures of Gaussian random
variables with base measure depending on some empirical quantities and the
estimation of the intensity of a counting process under the Aalen model. A
simulation study for inhomogeneous Poisson processes also illustrates our
results. In the former case we also derive some results on the estimation of
the mixing density and on the deconvolution problem. In the latter, we provide
a general theorem on posterior concentration rates for counting processes with
Aalen multiplicative intensity with priors not depending on the data.Comment: With supplementary materia
Numerical performance of Penalized Comparison to Overfitting for multivariate kernel density estimation
Kernel density estimation is a well known method involving a smoothing
parameter (the bandwidth) that needs to be tuned by the user. Although this
method has been widely used the bandwidth selection remains a challenging issue
in terms of balancing algorithmic performance and statistical relevance. The
purpose of this paper is to compare a recently developped bandwidth selection
method for kernel density estimation to those which are commonly used by now
(at least those which are implemented in the R-package). This new method is
called Penalized Comparison to Overfitting (PCO). It has been proposed by some
of the authors of this paper in a previous work devoted to its statistical
relevance from a purely theoretical perspective. It is compared here to other
usual bandwidth selection methods for univariate and also multivariate kernel
density estimation on the basis of intensive simulation studies. In particular,
cross-validation and plug-in criteria are numerically investigated and compared
to PCO. The take home message is that PCO can outperform the classical methods
without algorithmic additionnal cost
- …