Search CORE

38 research outputs found

About the non-asymptotic behaviour of Bayes estimators

Author: Birgé Lucien
Publication venue
Publication date: 31/10/2014
Field of study

This paper investigates the {\em nonasymptotic} properties of Bayes procedures for estimating an unknown distribution from

n

i.i.d.\ observations. We assume that the prior is supported by a model (\scr{S},h) (where

h

denotes the Hellinger distance) with suitable metric properties involving the number of small balls that are needed to cover larger ones. We also require that the prior put enough probability on small balls. We consider two different situations. The simplest case is the one of a parametric model containing the target density for which we show that the posterior concentrates around the true distribution at rate

1/\sqrt{n}

. In the general situation, we relax the parametric assumption and take into account a possible mispecification of the model. Provided that the Kullback-Leibler Information between the true distribution and \scr{S} is finite, we establish risk bounds for the Bayes estimators.Comment: Extended version of a talk given in June 2013 at BNP9 Conference in Amsterdam - 17 page

arXiv.org e-Print Archive

Hal-Diderot

The Brouwer Lecture 2005: Statistical estimation with model selection

Author: Birgé Lucien
Publication venue
Publication date: 01/01/2006
Field of study

The purpose of this paper is to explain the interest and importance of (approximate) models and model selection in Statistics. Starting from the very elementary example of histograms we present a general notion of finite dimensional model for statistical estimation and we explain what type of risk bounds can be expected from the use of one such model. We then give the performance of suitable model selection procedures from a family of such models. We illustrate our point of view by two main examples: the choice of a partition for designing a histogram from an n-sample and the problem of variable selection in the context of Gaussian regression

arXiv.org e-Print Archive

CiteSeerX

Hal-Diderot

Rho-estimators revisited: General theory and applications

Author: Baraud Yannick
Birgé Lucien
Publication venue
Publication date: 15/06/2016
Field of study

Following Baraud, Birg\'e and Sart (2017), we pursue our attempt to design a robust universal estimator of the joint ditribution of

n

independent (but not necessarily i.i.d.) observations for an Hellinger-type loss. Given such observations with an unknown joint distribution

\mathbf{P}

and a dominated model

\mathscr{Q}

for

\mathbf{P}

, we build an estimator

\widehat{\mathbf{P}}

based on

\mathscr{Q}

and measure its risk by an Hellinger-type distance. When

\mathbf{P}

does belong to the model, this risk is bounded by some quantity which relies on the local complexity of the model in a vicinity of

\mathbf{P}

. In most situations this bound corresponds to the minimax risk over the model (up to a possible logarithmic factor). When

\mathbf{P}

does not belong to the model, its risk involves an additional bias term proportional to the distance between

\mathbf{P}

and

\mathscr{Q}

, whatever the true distribution

\mathbf{P}

. From this point of view, this new version of

\rho

-estimators improves upon the previous one described in Baraud, Birg\'e and Sart (2017) which required that

\mathbf{P}

be absolutely continuous with respect to some known reference measure. Further additional improvements have been brought as compared to the former construction. In particular, it provides a very general treatment of the regression framework with random design as well as a computationally tractable procedure for aggregating estimators. We also give some conditions for the Maximum Likelihood Estimator to be a

\rho

-estimator. Finally, we consider the situation where the Statistician has at disposal many different models and we build a penalized version of the

\rho

-estimator for model selection and adaptation purposes. In the regression setting, this penalized estimator not only allows to estimate the regression function but also the distribution of the errors.Comment: 73 page

arXiv.org e-Print Archive

Open Repository and Bibliography - Luxembourg

Hal-Diderot

Rates of convergence of rho-estimators for sets of densities satisfying shape constraints

Author: Baraud Yannick
Birgé Lucien
Publication venue
Publication date: 01/01/2016
Field of study

The purpose of this paper is to pursue our study of rho-estimators built from i.i.d. observations that we defined in Baraud et al. (2014). For a \rho-estimator based on some model S (which means that the estimator belongs to S) and a true distribution of the observations that also belongs to S, the risk (with squared Hellinger loss) is bounded by a quantity which can be viewed as a dimension function of the model and is often related to the "metric dimension" of this model, as defined in Birg\'e (2006). This is a minimax point of view and it is well-known that it is pessimistic. Typically, the bound is accurate for most points in the model but may be very pessimistic when the true distribution belongs to some specific part of it. This is the situation that we want to investigate here. For some models, like the set of decreasing densities on [0,1], there exist specific points in the model that we shall call "extremal" and for which the risk is substantially smaller than the typical risk. Moreover, the risk at a non-extremal point of the model can be bounded by the sum of the risk bound at a well-chosen extremal point plus the square of its distance to this point. This implies that if the true density is close enough to an extremal point, the risk at this point may be smaller than the minimax risk on the model and this actually remains true even if the true density does not belong to the model. The result is based on some refined bounds on the suprema of empirical processes that are established in Baraud (2016).Comment: 24 page

arXiv.org e-Print Archive

HAL-UNICE

Hal-Diderot

Robust Bayes-Like Estimation: Rho-Bayes estimation

Author: Baraud Yannick
Birgé Lucien
Publication venue
Publication date: 01/01/2020
Field of study

We consider the problem of estimating the joint distribution

P

n

independent random variables within the Bayes paradigm from a non-asymptotic point of view. Assuming that

P

admits some density

s

with respect to a given reference measure, we consider a density model

\overline S

for

s

that we endow with a prior distribution

\pi

(with support

\overline S

) and we build a robust alternative to the classical Bayes posterior distribution which possesses similar concentration properties around

s

whenever it belongs to the model

\overline S

. Furthermore, in density estimation, the Hellinger distance between the classical and the robust posterior distributions tends to 0, as the number of observations tends to infinity, under suitable assumptions on the model and the prior, provided that the model

\overline S

contains the true density

s

. However, unlike what happens with the classical Bayes posterior distribution, we show that the concentration properties of this new posterior distribution are still preserved in the case of a misspecification of the model, that is when

s

does not belong to

\overline S

but is close enough to it with respect to the Hellinger distance.Comment: 68 page

arXiv.org e-Print Archive

Open Repository and Bibliography - Luxembourg

Estimating composite functions by model selection

Author: Baraud Yannick
Birgé Lucien
Publication venue
Publication date: 01/01/2013
Field of study

We consider the problem of estimating a function

s

[-1,1]^{k}

for large values of

k

by looking for some best approximation by composite functions of the form

g\circ u

. Our solution is based on model selection and leads to a very general approach to solve this problem with respect to many different types of functions

g,u

and statistical frameworks. In particular, we handle the problems of approximating

s

by additive functions, single and multiple index models, neural networks, mixtures of Gaussian densities (when

s

is a density) among other examples. We also investigate the situation where

s=g\circ u

for functions

g

and

u

belonging to possibly anisotropic smoothness classes. In this case, our approach leads to a completely adaptive estimator with respect to the regularity of

s

.Comment: 37 page

arXiv.org e-Print Archive

HAL-UNICE

Numérisation de Documents Anciens Mathématiques

Open Repository and Bibliography - Luxembourg

Hal-Diderot

Model selection for density estimation with L2-loss

Author: Birgé Lucien
Publication venue: HAL CCSD
Publication date: 20/10/2008
Field of study

32 pagesWe consider here estimation of an unknown probability density

s

belonging to

\Bbb{L}_2(\mu)

where

\mu

is a probability measure. We have at hand

n

i.i.d.\ observations with density

s

and use the squared

\Bbb{L}_2

-norm as our loss function. The purpose of this paper is to provide an abstract but completely general method for estimating

s

by model selection, allowing to handle arbitrary families of finite-dimensional (possibly non-linear) models and any

s\in\Bbb{L}_2(\mu)

. We shall, in particular, consider the cases of unbounded densities and bounded densities with unknown

\Bbb{L}_\infty

-norm and investigate how the

\Bbb{L}_\infty

-norm of

s

may influence the risk. We shall also provide applications to adaptive estimation and aggregation of preliminary estimators. Although of a purely theoretical nature, our method leads to results that cannot presently be reached by more concrete ones

Hal-Diderot

Model selection for density estimation with L2-loss

Author: Birgé Lucien
Publication venue
Publication date: 19/01/2013
Field of study

We consider here estimation of an unknown probability density s belonging to L2(mu) where mu is a probability measure. We have at hand n i.i.d. observations with density s and use the squared L2-norm as our loss function. The purpose of this paper is to provide an abstract but completely general method for estimating s by model selection, allowing to handle arbitrary families of finite-dimensional (possibly non-linear) models and any density s belonging to L2(mu). We shall, in particular, consider the cases of unbounded densities and bounded densities with unknown bound and investigate how the L-infinity-norm of s may influence the risk. We shall also provide applications to adaptive estimation and aggregation of preliminary estimators. Although of a purely theoretical nature, our method leads to results that cannot presently be reached by more concrete methods.Comment: 37 pages. Minor change

arXiv.org e-Print Archive

Hal-Diderot

A new V-fold type procedure based on robust tests

Author: Birgé Lucien
Magalhães Nelo
Massart Pascal
Publication venue
Publication date: 01/06/2015
Field of study

We define a general V-fold cross-validation type method based on robust tests, which is an extension of the hold-out defined by Birg{\'e} [7, Section 9]. We give some theoretical results showing that, under some weak assumptions on the considered statistical procedures, our selected estimator satisfies an oracle type inequality. We also introduce a fast algorithm that implements our method. Moreover we show in our simulations that this V-fold performs generally well for estimating a density for different sample sizes, and can handle well-known problems, such as binwidth selection for histograms or bandwidth selection for kernels. We finally provide a comparison with other classical V-fold methods and study empirically the influence of the value of V on the risk

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot