38 research outputs found

    About the non-asymptotic behaviour of Bayes estimators

    Full text link
    This paper investigates the {\em nonasymptotic} properties of Bayes procedures for estimating an unknown distribution from nn i.i.d.\ observations. We assume that the prior is supported by a model (\scr{S},h) (where hh denotes the Hellinger distance) with suitable metric properties involving the number of small balls that are needed to cover larger ones. We also require that the prior put enough probability on small balls. We consider two different situations. The simplest case is the one of a parametric model containing the target density for which we show that the posterior concentrates around the true distribution at rate 1/n1/\sqrt{n}. In the general situation, we relax the parametric assumption and take into account a possible mispecification of the model. Provided that the Kullback-Leibler Information between the true distribution and \scr{S} is finite, we establish risk bounds for the Bayes estimators.Comment: Extended version of a talk given in June 2013 at BNP9 Conference in Amsterdam - 17 page

    The Brouwer Lecture 2005: Statistical estimation with model selection

    Get PDF
    The purpose of this paper is to explain the interest and importance of (approximate) models and model selection in Statistics. Starting from the very elementary example of histograms we present a general notion of finite dimensional model for statistical estimation and we explain what type of risk bounds can be expected from the use of one such model. We then give the performance of suitable model selection procedures from a family of such models. We illustrate our point of view by two main examples: the choice of a partition for designing a histogram from an n-sample and the problem of variable selection in the context of Gaussian regression

    Rho-estimators revisited: General theory and applications

    Get PDF
    Following Baraud, Birg\'e and Sart (2017), we pursue our attempt to design a robust universal estimator of the joint ditribution of nn independent (but not necessarily i.i.d.) observations for an Hellinger-type loss. Given such observations with an unknown joint distribution P\mathbf{P} and a dominated model Q\mathscr{Q} for P\mathbf{P}, we build an estimator P^\widehat{\mathbf{P}} based on Q\mathscr{Q} and measure its risk by an Hellinger-type distance. When P\mathbf{P} does belong to the model, this risk is bounded by some quantity which relies on the local complexity of the model in a vicinity of P\mathbf{P}. In most situations this bound corresponds to the minimax risk over the model (up to a possible logarithmic factor). When P\mathbf{P} does not belong to the model, its risk involves an additional bias term proportional to the distance between P\mathbf{P} and Q\mathscr{Q}, whatever the true distribution P\mathbf{P}. From this point of view, this new version of ρ\rho-estimators improves upon the previous one described in Baraud, Birg\'e and Sart (2017) which required that P\mathbf{P} be absolutely continuous with respect to some known reference measure. Further additional improvements have been brought as compared to the former construction. In particular, it provides a very general treatment of the regression framework with random design as well as a computationally tractable procedure for aggregating estimators. We also give some conditions for the Maximum Likelihood Estimator to be a ρ\rho-estimator. Finally, we consider the situation where the Statistician has at disposal many different models and we build a penalized version of the ρ\rho-estimator for model selection and adaptation purposes. In the regression setting, this penalized estimator not only allows to estimate the regression function but also the distribution of the errors.Comment: 73 page

    Rates of convergence of rho-estimators for sets of densities satisfying shape constraints

    Full text link
    The purpose of this paper is to pursue our study of rho-estimators built from i.i.d. observations that we defined in Baraud et al. (2014). For a \rho-estimator based on some model S (which means that the estimator belongs to S) and a true distribution of the observations that also belongs to S, the risk (with squared Hellinger loss) is bounded by a quantity which can be viewed as a dimension function of the model and is often related to the "metric dimension" of this model, as defined in Birg\'e (2006). This is a minimax point of view and it is well-known that it is pessimistic. Typically, the bound is accurate for most points in the model but may be very pessimistic when the true distribution belongs to some specific part of it. This is the situation that we want to investigate here. For some models, like the set of decreasing densities on [0,1], there exist specific points in the model that we shall call "extremal" and for which the risk is substantially smaller than the typical risk. Moreover, the risk at a non-extremal point of the model can be bounded by the sum of the risk bound at a well-chosen extremal point plus the square of its distance to this point. This implies that if the true density is close enough to an extremal point, the risk at this point may be smaller than the minimax risk on the model and this actually remains true even if the true density does not belong to the model. The result is based on some refined bounds on the suprema of empirical processes that are established in Baraud (2016).Comment: 24 page

    Robust Bayes-Like Estimation: Rho-Bayes estimation

    Get PDF
    We consider the problem of estimating the joint distribution PP of nn independent random variables within the Bayes paradigm from a non-asymptotic point of view. Assuming that PP admits some density ss with respect to a given reference measure, we consider a density model S\overline S for ss that we endow with a prior distribution π\pi (with support S\overline S) and we build a robust alternative to the classical Bayes posterior distribution which possesses similar concentration properties around ss whenever it belongs to the model S\overline S. Furthermore, in density estimation, the Hellinger distance between the classical and the robust posterior distributions tends to 0, as the number of observations tends to infinity, under suitable assumptions on the model and the prior, provided that the model S\overline S contains the true density ss. However, unlike what happens with the classical Bayes posterior distribution, we show that the concentration properties of this new posterior distribution are still preserved in the case of a misspecification of the model, that is when ss does not belong to S\overline S but is close enough to it with respect to the Hellinger distance.Comment: 68 page

    Estimating composite functions by model selection

    Get PDF
    We consider the problem of estimating a function ss on [1,1]k[-1,1]^{k} for large values of kk by looking for some best approximation by composite functions of the form gug\circ u. Our solution is based on model selection and leads to a very general approach to solve this problem with respect to many different types of functions g,ug,u and statistical frameworks. In particular, we handle the problems of approximating ss by additive functions, single and multiple index models, neural networks, mixtures of Gaussian densities (when ss is a density) among other examples. We also investigate the situation where s=gus=g\circ u for functions gg and uu belonging to possibly anisotropic smoothness classes. In this case, our approach leads to a completely adaptive estimator with respect to the regularity of ss.Comment: 37 page

    Model selection for density estimation with L2-loss

    Get PDF
    32 pagesWe consider here estimation of an unknown probability density ss belonging to L2(μ)\Bbb{L}_2(\mu) where μ\mu is a probability measure. We have at hand nn i.i.d.\ observations with density ss and use the squared L2\Bbb{L}_2-norm as our loss function. The purpose of this paper is to provide an abstract but completely general method for estimating ss by model selection, allowing to handle arbitrary families of finite-dimensional (possibly non-linear) models and any sL2(μ)s\in\Bbb{L}_2(\mu). We shall, in particular, consider the cases of unbounded densities and bounded densities with unknown L\Bbb{L}_\infty-norm and investigate how the L\Bbb{L}_\infty-norm of ss may influence the risk. We shall also provide applications to adaptive estimation and aggregation of preliminary estimators. Although of a purely theoretical nature, our method leads to results that cannot presently be reached by more concrete ones

    Model selection for density estimation with L2-loss

    Full text link
    We consider here estimation of an unknown probability density s belonging to L2(mu) where mu is a probability measure. We have at hand n i.i.d. observations with density s and use the squared L2-norm as our loss function. The purpose of this paper is to provide an abstract but completely general method for estimating s by model selection, allowing to handle arbitrary families of finite-dimensional (possibly non-linear) models and any density s belonging to L2(mu). We shall, in particular, consider the cases of unbounded densities and bounded densities with unknown bound and investigate how the L-infinity-norm of s may influence the risk. We shall also provide applications to adaptive estimation and aggregation of preliminary estimators. Although of a purely theoretical nature, our method leads to results that cannot presently be reached by more concrete methods.Comment: 37 pages. Minor change

    A new V-fold type procedure based on robust tests

    Get PDF
    We define a general V-fold cross-validation type method based on robust tests, which is an extension of the hold-out defined by Birg{\'e} [7, Section 9]. We give some theoretical results showing that, under some weak assumptions on the considered statistical procedures, our selected estimator satisfies an oracle type inequality. We also introduce a fast algorithm that implements our method. Moreover we show in our simulations that this V-fold performs generally well for estimating a density for different sample sizes, and can handle well-known problems, such as binwidth selection for histograms or bandwidth selection for kernels. We finally provide a comparison with other classical V-fold methods and study empirically the influence of the value of V on the risk
    corecore