1,581 research outputs found
Optimal Bayesian estimation in stochastic block models
With the advent of structured data in the form of social networks, genetic
circuits and protein interaction networks, statistical analysis of networks has
gained popularity over recent years. Stochastic block model constitutes a
classical cluster-exhibiting random graph model for networks. There is a
substantial amount of literature devoted to proposing strategies for estimating
and inferring parameters of the model, both from classical and Bayesian
viewpoints. Unlike the classical counterpart, there is however a dearth of
theoretical results on the accuracy of estimation in the Bayesian setting. In
this article, we undertake a theoretical investigation of the posterior
distribution of the parameters in a stochastic block model. In particular, we
show that one obtains optimal rates of posterior convergence with routinely
used multinomial-Dirichlet priors on cluster indicators and uniform priors on
the probabilities of the random edge indicators. En route, we develop geometric
embedding techniques to exploit the lower dimensional structure of the
parameter space which may be of independent interest.Comment: 23 page
Posterior contraction in Gaussian process regression using Wasserstein approximations
We study posterior rates of contraction in Gaussian process regression with
unbounded covariate domain. Our argument relies on developing a Gaussian
approximation to the posterior of the leading coefficients of a
Karhunen--Lo\'{e}ve expansion of the Gaussian process. The salient feature of
our result is deriving such an approximation in the Wasserstein distance
and relating the speed of the approximation to the posterior contraction rate
using a coupling argument. Specific illustrations are provided for the Gaussian
or squared-exponential covariance kernel.Comment: previous version modified to focus on the rate of posterior
convergenc
Comment on Article by Dawid and Musio
Discussion of "Bayesian Model Selection Based on Proper Scoring Rules" by
Dawid and Musio [arXiv:1409.5291].Comment: Published at http://dx.doi.org/10.1214/15-BA942A in the Bayesian
Analysis (http://projecteuclid.org/euclid.ba) by the International Society of
Bayesian Analysis (http://bayesian.org/
Nonasymptotic Laplace approximation under model misspecification
We present non-asymptotic two-sided bounds to the log-marginal likelihood in
Bayesian inference. The classical Laplace approximation is recovered as the
leading term. Our derivation permits model misspecification and allows the
parameter dimension to grow with the sample size. We do not make any
assumptions about the asymptotic shape of the posterior, and instead require
certain regularity conditions on the likelihood ratio and that the posterior to
be sufficiently concentrated.Comment: 23 pages. Fixed minor technical glitches in the proof of Theorem 2 in
the updated versio
Signal Adaptive Variable Selector for the Horseshoe Prior
In this article, we propose a simple method to perform variable selection as
a post model-fitting exercise using continuous shrinkage priors such as the
popular horseshoe prior. The proposed Signal Adaptive Variable Selector (SAVS)
approach post-processes a point estimate such as the posterior mean to group
the variables into signals and nulls. The approach is completely automated and
does not require specification of any tuning parameters. We carried out a
comprehensive simulation study to compare the performance of the proposed SAVS
approach to frequentist penalization procedures and Bayesian model selection
procedures. SAVS was found to be highly competitive across all the settings
considered, and was particularly found to be robust to correlated designs. We
also applied SAVS to a genomic dataset with more than 20,000 covariates to
illustrate its scalability.Comment: 21 pages (including appendix and references), 11 figures, 10 table
Optimal Gaussian approximations to the posterior for log-linear models with Diaconis-Ylvisaker priors
In contingency table analysis, sparse data is frequently encountered for even
modest numbers of variables, resulting in non-existence of maximum likelihood
estimates. A common solution is to obtain regularized estimates of the
parameters of a log-linear model. Bayesian methods provide a coherent approach
to regularization, but are often computationally intensive. Conjugate priors
ease computational demands, but the conjugate Diaconis-Ylvisaker priors for the
parameters of log-linear models do not give rise to closed form credible
regions, complicating posterior inference. Here we derive the optimal Gaussian
approximation to the posterior for log-linear models with Diaconis-Ylvisaker
priors, and provide convergence rate and finite-sample bounds for the
Kullback-Leibler divergence between the exact posterior and the optimal
Gaussian approximation. We demonstrate empirically in simulations and a real
data application that the approximation is highly accurate, even in relatively
small samples. The proposed approximation provides a computationally scalable
and principled approach to regularized estimation and approximate Bayesian
inference for log-linear models
-Variational Inference with Statistical Guarantees
We propose a family of variational approximations to Bayesian posterior
distributions, called -VB, with provable statistical guarantees. The
standard variational approximation is a special case of -VB with
. When , a novel class of variational inequalities
are developed for linking the Bayes risk under the variational approximation to
the objective function in the variational optimization problem, implying that
maximizing the evidence lower bound in variational inference has the effect of
minimizing the Bayes risk within the variational density family. Operating in a
frequentist setup, the variational inequalities imply that point estimates
constructed from the -VB procedure converge at an optimal rate to the
true parameter in a wide range of problems. We illustrate our general theory
with a number of examples, including the mean-field variational approximation
to (low)-high-dimensional Bayesian linear regression with spike and slab
priors, mixture of Gaussian models, latent Dirichlet allocation, and (mixture
of) Gaussian variational approximation in regular parametric models
Probabilistic community detection with unknown number of communities
A fundamental problem in network analysis is clustering the nodes into groups
which share a similar connectivity pattern. Existing algorithms for community
detection assume the knowledge of the number of clusters or estimate it a
priori using various selection criteria and subsequently estimate the community
structure. Ignoring the uncertainty in the first stage may lead to erroneous
clustering, particularly when the community structure is vague. We instead
propose a coherent probabilistic framework for simultaneous estimation of the
number of communities and the community structure, adapting recently developed
Bayesian nonparametric techniques to network models. An efficient Markov chain
Monte Carlo (MCMC) algorithm is proposed which obviates the need to perform
reversible jump MCMC on the number of clusters. The methodology is shown to
outperform recently developed community detection algorithms in a variety of
synthetic data examples and in benchmark real-datasets. Using an appropriate
metric on the space of all configurations, we develop non-asymptotic Bayes risk
bounds even when the number of clusters is unknown. Enroute, we develop
concentration properties of non-linear functions of Bernoulli random variables,
which may be of independent interest
Compressed Covariance Estimation With Automated Dimension Learning
We propose a method for estimating a covariance matrix that can be
represented as a sum of a low-rank matrix and a diagonal matrix. The proposed
method compresses high-dimensional data, computes the sample covariance in the
compressed space, and lifts it back to the ambient space via a decompression
operation. A salient feature of our approach relative to existing literature on
combining sparsity and low-rank structures in covariance matrix estimation is
that we do not require the low-rank component to be sparse. A principled
framework for estimating the compressed dimension using Stein's Unbiased Risk
Estimation theory is demonstrated. Experimental simulation results demonstrate
the efficacy and scalability of our proposed approach
On the self-interaction of dark energy in a ghost-condensate model
In a ghost-condensate model of dark energy the combined dynamics of the
scalar field and gravitation is shown to impose non-trivial restriction on the
self-interaction of the scalar field. Using this restriction we show that the
choice of a zero self-interaction leads to a situation too restrictive for the
general evolution of the universe. This restriction, obtained in the form of a
quadratic equation of the scalar potential, is demonstrated to admit real
solutions. Also, in the appropriate limit it reproduces the potential in the
phantom cosmology.Comment: 4 pages, Late
- …