61 research outputs found
Posterior contraction in Gaussian process regression using Wasserstein approximations
We study posterior rates of contraction in Gaussian process regression with
unbounded covariate domain. Our argument relies on developing a Gaussian
approximation to the posterior of the leading coefficients of a
Karhunen--Lo\'{e}ve expansion of the Gaussian process. The salient feature of
our result is deriving such an approximation in the Wasserstein distance
and relating the speed of the approximation to the posterior contraction rate
using a coupling argument. Specific illustrations are provided for the Gaussian
or squared-exponential covariance kernel.Comment: previous version modified to focus on the rate of posterior
convergenc
Nonasymptotic Laplace approximation under model misspecification
We present non-asymptotic two-sided bounds to the log-marginal likelihood in
Bayesian inference. The classical Laplace approximation is recovered as the
leading term. Our derivation permits model misspecification and allows the
parameter dimension to grow with the sample size. We do not make any
assumptions about the asymptotic shape of the posterior, and instead require
certain regularity conditions on the likelihood ratio and that the posterior to
be sufficiently concentrated.Comment: 23 pages. Fixed minor technical glitches in the proof of Theorem 2 in
the updated versio
Adaptive Bayesian Estimation of Conditional Densities
We consider a non-parametric Bayesian model for conditional densities. The
model is a finite mixture of normal distributions with covariate dependent
multinomial logit mixing probabilities. A prior for the number of mixture
components is specified on positive integers. The marginal distribution of
covariates is not modeled. We study asymptotic frequentist behavior of the
posterior in this model. Specifically, we show that when the true conditional
density has a certain smoothness level, then the posterior contraction rate
around the truth is equal up to a log factor to the frequentist minimax rate of
estimation. An extension to the case when the covariate space is unbounded is
also established. As our result holds without a priori knowledge of the
smoothness level of the true density, the established posterior contraction
rates are adaptive. Moreover, we show that the rate is not affected by
inclusion of irrelevant covariates in the model. In Monte Carlo simulations, a
version of the model compares favorably to a cross-validated kernel conditional
density estimator.Comment: 32 pages, 2 figure
Optimal Bayesian estimation in stochastic block models
With the advent of structured data in the form of social networks, genetic
circuits and protein interaction networks, statistical analysis of networks has
gained popularity over recent years. Stochastic block model constitutes a
classical cluster-exhibiting random graph model for networks. There is a
substantial amount of literature devoted to proposing strategies for estimating
and inferring parameters of the model, both from classical and Bayesian
viewpoints. Unlike the classical counterpart, there is however a dearth of
theoretical results on the accuracy of estimation in the Bayesian setting. In
this article, we undertake a theoretical investigation of the posterior
distribution of the parameters in a stochastic block model. In particular, we
show that one obtains optimal rates of posterior convergence with routinely
used multinomial-Dirichlet priors on cluster indicators and uniform priors on
the probabilities of the random edge indicators. En route, we develop geometric
embedding techniques to exploit the lower dimensional structure of the
parameter space which may be of independent interest.Comment: 23 page
Variable Selection Using Shrinkage Priors
Variable selection has received widespread attention over the last decade as
we routinely encounter high-throughput datasets in complex biological and
environment research. Most Bayesian variable selection methods are restricted
to mixture priors having separate components for characterizing the signal and
the noise. However, such priors encounter computational issues in high
dimensions. This has motivated continuous shrinkage priors, resembling the
two-component priors facilitating computation and interpretability. While such
priors are widely used for estimating high-dimensional sparse vectors,
selecting a subset of variables remains a daunting task. In this article, we
propose a general approach for variable selection with shrinkage priors. The
presence of very few tuning parameters makes our method attractive in
comparison to adhoc thresholding approaches. The applicability of the approach
is not limited to continuous shrinkage priors, but can be used along with any
shrinkage prior. Theoretical properties for near-collinear design matrices are
investigated and the method is shown to have good performance in a wide range
of synthetic data examples
Sparse additive Gaussian process with soft interactions
Additive nonparametric regression models provide an attractive tool for
variable selection in high dimensions when the relationship between the
response and predictors is complex. They offer greater flexibility compared to
parametric non-linear regression models and better interpretability and
scalability than the non-parametric regression models. However, achieving
sparsity simultaneously in the number of nonparametric components as well as in
the variables within each nonparametric component poses a stiff computational
challenge. In this article, we develop a novel Bayesian additive regression
model using a combination of hard and soft shrinkages to separately control the
number of additive components and the variables within each component. An
efficient algorithm is developed to select the importance variables and
estimate the interaction network. Excellent performance is obtained in
simulated and real data examples.Comment: Submitted to Technometrics Journa
Probabilistic community detection with unknown number of communities
A fundamental problem in network analysis is clustering the nodes into groups
which share a similar connectivity pattern. Existing algorithms for community
detection assume the knowledge of the number of clusters or estimate it a
priori using various selection criteria and subsequently estimate the community
structure. Ignoring the uncertainty in the first stage may lead to erroneous
clustering, particularly when the community structure is vague. We instead
propose a coherent probabilistic framework for simultaneous estimation of the
number of communities and the community structure, adapting recently developed
Bayesian nonparametric techniques to network models. An efficient Markov chain
Monte Carlo (MCMC) algorithm is proposed which obviates the need to perform
reversible jump MCMC on the number of clusters. The methodology is shown to
outperform recently developed community detection algorithms in a variety of
synthetic data examples and in benchmark real-datasets. Using an appropriate
metric on the space of all configurations, we develop non-asymptotic Bayes risk
bounds even when the number of clusters is unknown. Enroute, we develop
concentration properties of non-linear functions of Bernoulli random variables,
which may be of independent interest
Compressed Covariance Estimation With Automated Dimension Learning
We propose a method for estimating a covariance matrix that can be
represented as a sum of a low-rank matrix and a diagonal matrix. The proposed
method compresses high-dimensional data, computes the sample covariance in the
compressed space, and lifts it back to the ambient space via a decompression
operation. A salient feature of our approach relative to existing literature on
combining sparsity and low-rank structures in covariance matrix estimation is
that we do not require the low-rank component to be sparse. A principled
framework for estimating the compressed dimension using Stein's Unbiased Risk
Estimation theory is demonstrated. Experimental simulation results demonstrate
the efficacy and scalability of our proposed approach
Bayesian Graph Selection Consistency Under Model Misspecification
Gaussian graphical models are a popular tool to learn the dependence
structure in the form of a graph among variables of interest. Bayesian methods
have gained in popularity in the last two decades due to their ability to
simultaneously learn the covariance and the graph and characterize uncertainty
in the selection. For scalability of the Markov chain Monte Carlo algorithms,
decomposability is commonly imposed on the graph space. A wide variety of
graphical conjugate priors are proposed jointly on the covariance matrix and
the graph with improved algorithms to search along the space of decomposable
graphs, rendering the methods extremely popular in the context of multivariate
dependence modeling. {\it An open problem} in Bayesian decomposable structure
learning is whether the posterior distribution is able to select a meaningful
decomposable graph that it is ``close'' in an appropriate sense to the true
non-decomposable graph, when the dimension of the variables increases with the
sample size. In this article, we explore specific conditions on the true
precision matrix and the graph which results in an affirmative answer to this
question using a commonly used hyper-inverse Wishart prior on the covariance
matrix and a suitable complexity prior on the graph space, both in the
well-specified and misspecified settings. In absence of structural sparsity
assumptions, our strong selection consistency holds in a high dimensional
setting where for . We show when the true
graph is non-decomposable, the posterior distribution on the graph concentrates
on a set of graphs that are {\it minimal triangulations} of the true graph.Comment: 43 page
Bayesian Clustering of Shapes of Curves
Unsupervised clustering of curves according to their shapes is an important
problem with broad scientific applications. The existing model-based clustering
techniques either rely on simple probability models (e.g., Gaussian) that are
not generally valid for shape analysis or assume the number of clusters. We
develop an efficient Bayesian method to cluster curve data using an elastic
shape metric that is based on joint registration and comparison of shapes of
curves. The elastic-inner product matrix obtained from the data is modeled
using a Wishart distribution whose parameters are assigned carefully chosen
prior distributions to allow for automatic inference on the number of clusters.
Posterior is sampled through an efficient Markov chain Monte Carlo procedure
based on the Chinese restaurant process to infer (1) the posterior distribution
on the number of clusters, and (2) clustering configuration of shapes. This
method is demonstrated on a variety of synthetic data and real data examples on
protein structure analysis, cell shape analysis in microscopy images, and
clustering of shaped from MPEG7 database
- …