64,027 research outputs found

    Convergence rates for Bayesian density estimation of infinite-dimensional exponential families

    Full text link
    We study the rate of convergence of posterior distributions in density estimation problems for log-densities in periodic Sobolev classes characterized by a smoothness parameter p. The posterior expected density provides a nonparametric estimation procedure attaining the optimal minimax rate of convergence under Hellinger loss if the posterior distribution achieves the optimal rate over certain uniformity classes. A prior on the density class of interest is induced by a prior on the coefficients of the trigonometric series expansion of the log-density. We show that when p is known, the posterior distribution of a Gaussian prior achieves the optimal rate provided the prior variances die off sufficiently rapidly. For a mixture of normal distributions, the mixing weights on the dimension of the exponential family are assumed to be bounded below by an exponentially decreasing sequence. To avoid the use of infinite bases, we develop priors that cut off the series at a sample-size-dependent truncation point. When the degree of smoothness is unknown, a finite mixture of normal priors indexed by the smoothness parameter, which is also assigned a prior, produces the best rate. A rate-adaptive estimator is derived.Comment: Published at http://dx.doi.org/10.1214/009053606000000911 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Log-Regularly Varying Scale Mixture of Normals for Robust Regression

    Full text link
    Linear regression with the classical normality assumption for the error distribution may lead to an undesirable posterior inference of regression coefficients due to the potential outliers. This paper considers the finite mixture of two components with thin and heavy tails as the error distribution, which has been routinely employed in applied statistics. For the heavily-tailed component, we introduce the novel class of distributions; their densities are log-regularly varying and have heavier tails than those of Cauchy distribution, yet they are expressed as a scale mixture of normal distributions and enable the efficient posterior inference by Gibbs sampler. We prove the robustness to outliers of the posterior distributions under the proposed models with a minimal set of assumptions, which justifies the use of shrinkage priors with unbounded densities for the coefficient vector in the presence of outliers. The extensive comparison with the existing methods via simulation study shows the improved performance of our model in point and interval estimation, as well as its computational efficiency. Further, we confirm the posterior robustness of our method in the empirical study with the shrinkage priors for regression coefficients.Comment: 62 page

    Conditions for Posterior Contraction in the Sparse Normal Means Problem

    Get PDF
    The first Bayesian results for the sparse normal means problem were proven for spike-and-slab priors. However, these priors are less convenient from a computational point of view. In the meanwhile, a large number of continuous shrinkage priors has been proposed. Many of these shrinkage priors can be written as a scale mixture of normals, which makes them particularly easy to implement. We propose general conditions on the prior on the local variance in scale mixtures of normals, such that posterior contraction at the minimax rate is assured. The conditions require tails at least as heavy as Laplace, but not too heavy, and a large amount of mass around zero relative to the tails, more so as the sparsity increases. These conditions give some general guidelines for choosing a shrinkage prior for estimation under a nearly black sparsity assumption. We verify these conditions for the class of priors considered by Ghosh and Chakrabarti (2015), which includes the horseshoe and the normal-exponential gamma priors, and for the horseshoe+, the inverse-Gaussian prior, the normal-gamma prior, and the spike-and-slab Lasso, and thus extend the number of shrinkage priors which are known to lead to posterior contraction at the minimax estimation rate

    Kullback Leibler property of kernel mixture priors in Bayesian density estimation

    Full text link
    Positivity of the prior probability of Kullback-Leibler neighborhood around the true density, commonly known as the Kullback-Leibler property, plays a fundamental role in posterior consistency. A popular prior for Bayesian estimation is given by a Dirichlet mixture, where the kernels are chosen depending on the sample space and the class of densities to be estimated. The Kullback-Leibler property of the Dirichlet mixture prior has been shown for some special kernels like the normal density or Bernstein polynomial, under appropriate conditions. In this paper, we obtain easily verifiable sufficient conditions, under which a prior obtained by mixing a general kernel possesses the Kullback-Leibler property. We study a wide variety of kernel used in practice, including the normal, tt, histogram, gamma, Weibull densities and so on, and show that the Kullback-Leibler property holds if some easily verifiable conditions are satisfied at the true density. This gives a catalog of conditions required for the Kullback-Leibler property, which can be readily used in applications.Comment: Published in at http://dx.doi.org/10.1214/07-EJS130 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

    On choosing mixture components via non-local priors

    Get PDF
    Choosing the number of mixture components remains an elusive challenge. Model selection criteria can be either overly liberal or conservative and return poorly-separated components of limited practical use. We formalize non-local priors (NLPs) for mixtures and show how they lead to well-separated components with non-negligible weight, interpretable as distinct subpopulations. We also propose an estimator for posterior model probabilities under local and non-local priors, showing that Bayes factors are ratios of posterior to prior empty-cluster probabilities. The estimator is widely applicable and helps set thresholds to drop unoccupied components in overfitted mixtures. We suggest default prior parameters based on multi-modality for Normal/T mixtures and minimal informativeness for categorical outcomes. We characterise theoretically the NLP-induced sparsity, derive tractable expressions and algorithms. We fully develop Normal, Binomial and product Binomial mixtures but the theory, computation and principles hold more generally. We observed a serious lack of sensitivity of the Bayesian information criterion (BIC), insufficient parsimony of the AIC and a local prior, and a mixed behavior of the singular BIC. We also considered overfitted mixtures, their performance was competitive but depended on tuning parameters. Under our default prior elicitation NLPs offered a good compromise between sparsity and power to detect meaningfully-separated components
    corecore