38 research outputs found
About the non-asymptotic behaviour of Bayes estimators
This paper investigates the {\em nonasymptotic} properties of Bayes
procedures for estimating an unknown distribution from i.i.d.\
observations. We assume that the prior is supported by a model (\scr{S},h)
(where denotes the Hellinger distance) with suitable metric properties
involving the number of small balls that are needed to cover larger ones. We
also require that the prior put enough probability on small balls.
We consider two different situations. The simplest case is the one of a
parametric model containing the target density for which we show that the
posterior concentrates around the true distribution at rate . In
the general situation, we relax the parametric assumption and take into account
a possible mispecification of the model. Provided that the Kullback-Leibler
Information between the true distribution and \scr{S} is finite, we establish
risk bounds for the Bayes estimators.Comment: Extended version of a talk given in June 2013 at BNP9 Conference in
Amsterdam - 17 page
The Brouwer Lecture 2005: Statistical estimation with model selection
The purpose of this paper is to explain the interest and importance of
(approximate) models and model selection in Statistics. Starting from the very
elementary example of histograms we present a general notion of finite
dimensional model for statistical estimation and we explain what type of risk
bounds can be expected from the use of one such model. We then give the
performance of suitable model selection procedures from a family of such
models. We illustrate our point of view by two main examples: the choice of a
partition for designing a histogram from an n-sample and the problem of
variable selection in the context of Gaussian regression
Rho-estimators revisited: General theory and applications
Following Baraud, Birg\'e and Sart (2017), we pursue our attempt to design a
robust universal estimator of the joint ditribution of independent (but not
necessarily i.i.d.) observations for an Hellinger-type loss. Given such
observations with an unknown joint distribution and a dominated
model for , we build an estimator
based on and measure its risk by an
Hellinger-type distance. When does belong to the model, this risk
is bounded by some quantity which relies on the local complexity of the model
in a vicinity of . In most situations this bound corresponds to the
minimax risk over the model (up to a possible logarithmic factor). When
does not belong to the model, its risk involves an additional bias
term proportional to the distance between and ,
whatever the true distribution . From this point of view, this new
version of -estimators improves upon the previous one described in
Baraud, Birg\'e and Sart (2017) which required that be absolutely
continuous with respect to some known reference measure. Further additional
improvements have been brought as compared to the former construction. In
particular, it provides a very general treatment of the regression framework
with random design as well as a computationally tractable procedure for
aggregating estimators. We also give some conditions for the Maximum Likelihood
Estimator to be a -estimator. Finally, we consider the situation where
the Statistician has at disposal many different models and we build a penalized
version of the -estimator for model selection and adaptation purposes. In
the regression setting, this penalized estimator not only allows to estimate
the regression function but also the distribution of the errors.Comment: 73 page
Rates of convergence of rho-estimators for sets of densities satisfying shape constraints
The purpose of this paper is to pursue our study of rho-estimators built from
i.i.d. observations that we defined in Baraud et al. (2014). For a
\rho-estimator based on some model S (which means that the estimator belongs to
S) and a true distribution of the observations that also belongs to S, the risk
(with squared Hellinger loss) is bounded by a quantity which can be viewed as a
dimension function of the model and is often related to the "metric dimension"
of this model, as defined in Birg\'e (2006). This is a minimax point of view
and it is well-known that it is pessimistic. Typically, the bound is accurate
for most points in the model but may be very pessimistic when the true
distribution belongs to some specific part of it. This is the situation that we
want to investigate here. For some models, like the set of decreasing densities
on [0,1], there exist specific points in the model that we shall call
"extremal" and for which the risk is substantially smaller than the typical
risk. Moreover, the risk at a non-extremal point of the model can be bounded by
the sum of the risk bound at a well-chosen extremal point plus the square of
its distance to this point. This implies that if the true density is close
enough to an extremal point, the risk at this point may be smaller than the
minimax risk on the model and this actually remains true even if the true
density does not belong to the model. The result is based on some refined
bounds on the suprema of empirical processes that are established in Baraud
(2016).Comment: 24 page
Robust Bayes-Like Estimation: Rho-Bayes estimation
We consider the problem of estimating the joint distribution of
independent random variables within the Bayes paradigm from a non-asymptotic
point of view. Assuming that admits some density with respect to a
given reference measure, we consider a density model for that
we endow with a prior distribution (with support ) and we
build a robust alternative to the classical Bayes posterior distribution which
possesses similar concentration properties around whenever it belongs to
the model . Furthermore, in density estimation, the Hellinger
distance between the classical and the robust posterior distributions tends to
0, as the number of observations tends to infinity, under suitable assumptions
on the model and the prior, provided that the model contains the
true density . However, unlike what happens with the classical Bayes
posterior distribution, we show that the concentration properties of this new
posterior distribution are still preserved in the case of a misspecification of
the model, that is when does not belong to but is close
enough to it with respect to the Hellinger distance.Comment: 68 page
Estimating composite functions by model selection
We consider the problem of estimating a function on for
large values of by looking for some best approximation by composite
functions of the form . Our solution is based on model selection and
leads to a very general approach to solve this problem with respect to many
different types of functions and statistical frameworks. In particular,
we handle the problems of approximating by additive functions, single and
multiple index models, neural networks, mixtures of Gaussian densities (when
is a density) among other examples. We also investigate the situation where
for functions and belonging to possibly anisotropic
smoothness classes. In this case, our approach leads to a completely adaptive
estimator with respect to the regularity of .Comment: 37 page
Model selection for density estimation with L2-loss
32 pagesWe consider here estimation of an unknown probability density belonging to where is a probability measure. We have at hand i.i.d.\ observations with density and use the squared -norm as our loss function. The purpose of this paper is to provide an abstract but completely general method for estimating by model selection, allowing to handle arbitrary families of finite-dimensional (possibly non-linear) models and any . We shall, in particular, consider the cases of unbounded densities and bounded densities with unknown -norm and investigate how the -norm of may influence the risk. We shall also provide applications to adaptive estimation and aggregation of preliminary estimators. Although of a purely theoretical nature, our method leads to results that cannot presently be reached by more concrete ones
Model selection for density estimation with L2-loss
We consider here estimation of an unknown probability density s belonging to
L2(mu) where mu is a probability measure. We have at hand n i.i.d. observations
with density s and use the squared L2-norm as our loss function. The purpose of
this paper is to provide an abstract but completely general method for
estimating s by model selection, allowing to handle arbitrary families of
finite-dimensional (possibly non-linear) models and any density s belonging to
L2(mu). We shall, in particular, consider the cases of unbounded densities and
bounded densities with unknown bound and investigate how the L-infinity-norm of
s may influence the risk. We shall also provide applications to adaptive
estimation and aggregation of preliminary estimators. Although of a purely
theoretical nature, our method leads to results that cannot presently be
reached by more concrete methods.Comment: 37 pages. Minor change
A new V-fold type procedure based on robust tests
We define a general V-fold cross-validation type method based on robust
tests, which is an extension of the hold-out defined by Birg{\'e} [7, Section
9]. We give some theoretical results showing that, under some weak assumptions
on the considered statistical procedures, our selected estimator satisfies an
oracle type inequality. We also introduce a fast algorithm that implements our
method. Moreover we show in our simulations that this V-fold performs generally
well for estimating a density for different sample sizes, and can handle
well-known problems, such as binwidth selection for histograms or bandwidth
selection for kernels. We finally provide a comparison with other classical
V-fold methods and study empirically the influence of the value of V on the
risk