64,027 research outputs found
Convergence rates for Bayesian density estimation of infinite-dimensional exponential families
We study the rate of convergence of posterior distributions in density
estimation problems for log-densities in periodic Sobolev classes characterized
by a smoothness parameter p. The posterior expected density provides a
nonparametric estimation procedure attaining the optimal minimax rate of
convergence under Hellinger loss if the posterior distribution achieves the
optimal rate over certain uniformity classes. A prior on the density class of
interest is induced by a prior on the coefficients of the trigonometric series
expansion of the log-density. We show that when p is known, the posterior
distribution of a Gaussian prior achieves the optimal rate provided the prior
variances die off sufficiently rapidly. For a mixture of normal distributions,
the mixing weights on the dimension of the exponential family are assumed to be
bounded below by an exponentially decreasing sequence. To avoid the use of
infinite bases, we develop priors that cut off the series at a
sample-size-dependent truncation point. When the degree of smoothness is
unknown, a finite mixture of normal priors indexed by the smoothness parameter,
which is also assigned a prior, produces the best rate. A rate-adaptive
estimator is derived.Comment: Published at http://dx.doi.org/10.1214/009053606000000911 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Log-Regularly Varying Scale Mixture of Normals for Robust Regression
Linear regression with the classical normality assumption for the error
distribution may lead to an undesirable posterior inference of regression
coefficients due to the potential outliers. This paper considers the finite
mixture of two components with thin and heavy tails as the error distribution,
which has been routinely employed in applied statistics. For the heavily-tailed
component, we introduce the novel class of distributions; their densities are
log-regularly varying and have heavier tails than those of Cauchy distribution,
yet they are expressed as a scale mixture of normal distributions and enable
the efficient posterior inference by Gibbs sampler. We prove the robustness to
outliers of the posterior distributions under the proposed models with a
minimal set of assumptions, which justifies the use of shrinkage priors with
unbounded densities for the coefficient vector in the presence of outliers. The
extensive comparison with the existing methods via simulation study shows the
improved performance of our model in point and interval estimation, as well as
its computational efficiency. Further, we confirm the posterior robustness of
our method in the empirical study with the shrinkage priors for regression
coefficients.Comment: 62 page
Conditions for Posterior Contraction in the Sparse Normal Means Problem
The first Bayesian results for the sparse normal means problem were proven
for spike-and-slab priors. However, these priors are less convenient from a
computational point of view. In the meanwhile, a large number of continuous
shrinkage priors has been proposed. Many of these shrinkage priors can be
written as a scale mixture of normals, which makes them particularly easy to
implement. We propose general conditions on the prior on the local variance in
scale mixtures of normals, such that posterior contraction at the minimax rate
is assured. The conditions require tails at least as heavy as Laplace, but not
too heavy, and a large amount of mass around zero relative to the tails, more
so as the sparsity increases. These conditions give some general guidelines for
choosing a shrinkage prior for estimation under a nearly black sparsity
assumption. We verify these conditions for the class of priors considered by
Ghosh and Chakrabarti (2015), which includes the horseshoe and the
normal-exponential gamma priors, and for the horseshoe+, the inverse-Gaussian
prior, the normal-gamma prior, and the spike-and-slab Lasso, and thus extend
the number of shrinkage priors which are known to lead to posterior contraction
at the minimax estimation rate
Kullback Leibler property of kernel mixture priors in Bayesian density estimation
Positivity of the prior probability of Kullback-Leibler neighborhood around
the true density, commonly known as the Kullback-Leibler property, plays a
fundamental role in posterior consistency. A popular prior for Bayesian
estimation is given by a Dirichlet mixture, where the kernels are chosen
depending on the sample space and the class of densities to be estimated. The
Kullback-Leibler property of the Dirichlet mixture prior has been shown for
some special kernels like the normal density or Bernstein polynomial, under
appropriate conditions. In this paper, we obtain easily verifiable sufficient
conditions, under which a prior obtained by mixing a general kernel possesses
the Kullback-Leibler property. We study a wide variety of kernel used in
practice, including the normal, , histogram, gamma, Weibull densities and so
on, and show that the Kullback-Leibler property holds if some easily verifiable
conditions are satisfied at the true density. This gives a catalog of
conditions required for the Kullback-Leibler property, which can be readily
used in applications.Comment: Published in at http://dx.doi.org/10.1214/07-EJS130 the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
On choosing mixture components via non-local priors
Choosing the number of mixture components remains an elusive challenge. Model
selection criteria can be either overly liberal or conservative and return
poorly-separated components of limited practical use. We formalize non-local
priors (NLPs) for mixtures and show how they lead to well-separated components
with non-negligible weight, interpretable as distinct subpopulations. We also
propose an estimator for posterior model probabilities under local and
non-local priors, showing that Bayes factors are ratios of posterior to prior
empty-cluster probabilities. The estimator is widely applicable and helps set
thresholds to drop unoccupied components in overfitted mixtures. We suggest
default prior parameters based on multi-modality for Normal/T mixtures and
minimal informativeness for categorical outcomes. We characterise theoretically
the NLP-induced sparsity, derive tractable expressions and algorithms. We fully
develop Normal, Binomial and product Binomial mixtures but the theory,
computation and principles hold more generally. We observed a serious lack of
sensitivity of the Bayesian information criterion (BIC), insufficient parsimony
of the AIC and a local prior, and a mixed behavior of the singular BIC. We also
considered overfitted mixtures, their performance was competitive but depended
on tuning parameters. Under our default prior elicitation NLPs offered a good
compromise between sparsity and power to detect meaningfully-separated
components
- …
