61 research outputs found

    Posterior contraction in Gaussian process regression using Wasserstein approximations

    Full text link
    We study posterior rates of contraction in Gaussian process regression with unbounded covariate domain. Our argument relies on developing a Gaussian approximation to the posterior of the leading coefficients of a Karhunen--Lo\'{e}ve expansion of the Gaussian process. The salient feature of our result is deriving such an approximation in the L2L^2 Wasserstein distance and relating the speed of the approximation to the posterior contraction rate using a coupling argument. Specific illustrations are provided for the Gaussian or squared-exponential covariance kernel.Comment: previous version modified to focus on the rate of posterior convergenc

    Nonasymptotic Laplace approximation under model misspecification

    Full text link
    We present non-asymptotic two-sided bounds to the log-marginal likelihood in Bayesian inference. The classical Laplace approximation is recovered as the leading term. Our derivation permits model misspecification and allows the parameter dimension to grow with the sample size. We do not make any assumptions about the asymptotic shape of the posterior, and instead require certain regularity conditions on the likelihood ratio and that the posterior to be sufficiently concentrated.Comment: 23 pages. Fixed minor technical glitches in the proof of Theorem 2 in the updated versio

    Adaptive Bayesian Estimation of Conditional Densities

    Full text link
    We consider a non-parametric Bayesian model for conditional densities. The model is a finite mixture of normal distributions with covariate dependent multinomial logit mixing probabilities. A prior for the number of mixture components is specified on positive integers. The marginal distribution of covariates is not modeled. We study asymptotic frequentist behavior of the posterior in this model. Specifically, we show that when the true conditional density has a certain smoothness level, then the posterior contraction rate around the truth is equal up to a log factor to the frequentist minimax rate of estimation. An extension to the case when the covariate space is unbounded is also established. As our result holds without a priori knowledge of the smoothness level of the true density, the established posterior contraction rates are adaptive. Moreover, we show that the rate is not affected by inclusion of irrelevant covariates in the model. In Monte Carlo simulations, a version of the model compares favorably to a cross-validated kernel conditional density estimator.Comment: 32 pages, 2 figure

    Optimal Bayesian estimation in stochastic block models

    Full text link
    With the advent of structured data in the form of social networks, genetic circuits and protein interaction networks, statistical analysis of networks has gained popularity over recent years. Stochastic block model constitutes a classical cluster-exhibiting random graph model for networks. There is a substantial amount of literature devoted to proposing strategies for estimating and inferring parameters of the model, both from classical and Bayesian viewpoints. Unlike the classical counterpart, there is however a dearth of theoretical results on the accuracy of estimation in the Bayesian setting. In this article, we undertake a theoretical investigation of the posterior distribution of the parameters in a stochastic block model. In particular, we show that one obtains optimal rates of posterior convergence with routinely used multinomial-Dirichlet priors on cluster indicators and uniform priors on the probabilities of the random edge indicators. En route, we develop geometric embedding techniques to exploit the lower dimensional structure of the parameter space which may be of independent interest.Comment: 23 page

    Variable Selection Using Shrinkage Priors

    Full text link
    Variable selection has received widespread attention over the last decade as we routinely encounter high-throughput datasets in complex biological and environment research. Most Bayesian variable selection methods are restricted to mixture priors having separate components for characterizing the signal and the noise. However, such priors encounter computational issues in high dimensions. This has motivated continuous shrinkage priors, resembling the two-component priors facilitating computation and interpretability. While such priors are widely used for estimating high-dimensional sparse vectors, selecting a subset of variables remains a daunting task. In this article, we propose a general approach for variable selection with shrinkage priors. The presence of very few tuning parameters makes our method attractive in comparison to adhoc thresholding approaches. The applicability of the approach is not limited to continuous shrinkage priors, but can be used along with any shrinkage prior. Theoretical properties for near-collinear design matrices are investigated and the method is shown to have good performance in a wide range of synthetic data examples

    Sparse additive Gaussian process with soft interactions

    Full text link
    Additive nonparametric regression models provide an attractive tool for variable selection in high dimensions when the relationship between the response and predictors is complex. They offer greater flexibility compared to parametric non-linear regression models and better interpretability and scalability than the non-parametric regression models. However, achieving sparsity simultaneously in the number of nonparametric components as well as in the variables within each nonparametric component poses a stiff computational challenge. In this article, we develop a novel Bayesian additive regression model using a combination of hard and soft shrinkages to separately control the number of additive components and the variables within each component. An efficient algorithm is developed to select the importance variables and estimate the interaction network. Excellent performance is obtained in simulated and real data examples.Comment: Submitted to Technometrics Journa

    Probabilistic community detection with unknown number of communities

    Full text link
    A fundamental problem in network analysis is clustering the nodes into groups which share a similar connectivity pattern. Existing algorithms for community detection assume the knowledge of the number of clusters or estimate it a priori using various selection criteria and subsequently estimate the community structure. Ignoring the uncertainty in the first stage may lead to erroneous clustering, particularly when the community structure is vague. We instead propose a coherent probabilistic framework for simultaneous estimation of the number of communities and the community structure, adapting recently developed Bayesian nonparametric techniques to network models. An efficient Markov chain Monte Carlo (MCMC) algorithm is proposed which obviates the need to perform reversible jump MCMC on the number of clusters. The methodology is shown to outperform recently developed community detection algorithms in a variety of synthetic data examples and in benchmark real-datasets. Using an appropriate metric on the space of all configurations, we develop non-asymptotic Bayes risk bounds even when the number of clusters is unknown. Enroute, we develop concentration properties of non-linear functions of Bernoulli random variables, which may be of independent interest

    Compressed Covariance Estimation With Automated Dimension Learning

    Full text link
    We propose a method for estimating a covariance matrix that can be represented as a sum of a low-rank matrix and a diagonal matrix. The proposed method compresses high-dimensional data, computes the sample covariance in the compressed space, and lifts it back to the ambient space via a decompression operation. A salient feature of our approach relative to existing literature on combining sparsity and low-rank structures in covariance matrix estimation is that we do not require the low-rank component to be sparse. A principled framework for estimating the compressed dimension using Stein's Unbiased Risk Estimation theory is demonstrated. Experimental simulation results demonstrate the efficacy and scalability of our proposed approach

    Bayesian Graph Selection Consistency Under Model Misspecification

    Full text link
    Gaussian graphical models are a popular tool to learn the dependence structure in the form of a graph among variables of interest. Bayesian methods have gained in popularity in the last two decades due to their ability to simultaneously learn the covariance and the graph and characterize uncertainty in the selection. For scalability of the Markov chain Monte Carlo algorithms, decomposability is commonly imposed on the graph space. A wide variety of graphical conjugate priors are proposed jointly on the covariance matrix and the graph with improved algorithms to search along the space of decomposable graphs, rendering the methods extremely popular in the context of multivariate dependence modeling. {\it An open problem} in Bayesian decomposable structure learning is whether the posterior distribution is able to select a meaningful decomposable graph that it is ``close'' in an appropriate sense to the true non-decomposable graph, when the dimension of the variables increases with the sample size. In this article, we explore specific conditions on the true precision matrix and the graph which results in an affirmative answer to this question using a commonly used hyper-inverse Wishart prior on the covariance matrix and a suitable complexity prior on the graph space, both in the well-specified and misspecified settings. In absence of structural sparsity assumptions, our strong selection consistency holds in a high dimensional setting where p=O(nα)p = O(n^{\alpha}) for α<1/3\alpha < 1/3. We show when the true graph is non-decomposable, the posterior distribution on the graph concentrates on a set of graphs that are {\it minimal triangulations} of the true graph.Comment: 43 page

    Bayesian Clustering of Shapes of Curves

    Full text link
    Unsupervised clustering of curves according to their shapes is an important problem with broad scientific applications. The existing model-based clustering techniques either rely on simple probability models (e.g., Gaussian) that are not generally valid for shape analysis or assume the number of clusters. We develop an efficient Bayesian method to cluster curve data using an elastic shape metric that is based on joint registration and comparison of shapes of curves. The elastic-inner product matrix obtained from the data is modeled using a Wishart distribution whose parameters are assigned carefully chosen prior distributions to allow for automatic inference on the number of clusters. Posterior is sampled through an efficient Markov chain Monte Carlo procedure based on the Chinese restaurant process to infer (1) the posterior distribution on the number of clusters, and (2) clustering configuration of shapes. This method is demonstrated on a variety of synthetic data and real data examples on protein structure analysis, cell shape analysis in microscopy images, and clustering of shaped from MPEG7 database
    • …
    corecore