39 research outputs found

    Flexible Bayesian Product Mixture Models for Vector Autoregressions

    Full text link
    Bayesian non-parametric methods based on Dirichlet process mixtures have seen tremendous success in various domains and are appealing in being able to borrow information by clustering samples that share identical parameters. However, such methods can face hurdles in heterogeneous settings where objects are expected to cluster only along a subset of axes or where clusters of samples share only a subset of identical parameters. We overcome such limitations by developing a novel class of product of Dirichlet process location-scale mixtures that enable independent clustering at multiple scales, which result in varying levels of information sharing across samples. First, we develop the approach for independent multivariate data. Subsequently we generalize it to multivariate time-series data under the framework of multi-subject Vector Autoregressive (VAR) models that is our primary focus, which go beyond parametric single-subject VAR models. We establish posterior consistency and develop efficient posterior computation for implementation. Extensive numerical studies involving VAR models show distinct advantages over competing methods, in terms of estimation, clustering, and feature selection accuracy. Our resting state fMRI analysis from the Human Connectome Project reveals biologically interpretable connectivity differences between distinct intelligence groups, while another air pollution application illustrates the superior forecasting accuracy compared to alternate methods

    High-dimensional Measurement Error Models for Lipschitz Loss

    Full text link
    Recently emerging large-scale biomedical data pose exciting opportunities for scientific discoveries. However, the ultrahigh dimensionality and non-negligible measurement errors in the data may create difficulties in estimation. There are limited methods for high-dimensional covariates with measurement error, that usually require knowledge of the noise distribution and focus on linear or generalized linear models. In this work, we develop high-dimensional measurement error models for a class of Lipschitz loss functions that encompasses logistic regression, hinge loss and quantile regression, among others. Our estimator is designed to minimize the L1L_1 norm among all estimators belonging to suitable feasible sets, without requiring any knowledge of the noise distribution. Subsequently, we generalize these estimators to a Lasso analog version that is computationally scalable to higher dimensions. We derive theoretical guarantees in terms of finite sample statistical error bounds and sign consistency, even when the dimensionality increases exponentially with the sample size. Extensive simulation studies demonstrate superior performance compared to existing methods in classification and quantile regression problems. An application to a gender classification task based on brain functional connectivity in the Human Connectome Project data illustrates improved accuracy under our approach, and the ability to reliably identify significant brain connections that drive gender differences

    Bayesian Nonparametric Methods for Conditional Distributions

    Get PDF
    In the first paper, we propose a flexible class of priors for density estimation avoiding discrete mixtures, based on random nonlinear functions of a uniform latent variable with an additive residual. Although discrete mixture modeling has formed the backbone of the literature on Bayesian density estimation incorporating covariates, the use of discrete mixtures leads to some well known disadvantages. We propose an alternative class of priors based on random nonlinear functions of a uniform latent variable with an additive residual. The induced prior for the density is shown to have desirable properties including ease of centering on an initial guess for the density, posterior consistency and straightforward computation via Gibbs sampling. In the second paper, we propose a Bayesian variable selection method involving non-parametric residuals, noting that the majority of literature has focused on the parametric counterpart. We generalize methods and asymptotic theory established for mixtures of g-priors to linear regression models with unknown residuals characterized by DP location mixture. We propose a mixture of semiparametric g-priors allowing for straightforward posterior computation via a stochastic search variable selection algorithm. In addition, Bayes factor and variable selection consistency is shown to result under a class of proper priors on g allowing the number of candidate predictors p to increase much faster than sample size n while making sparsity assumption on the true model size. Our third paper is motivated by the fact that although there are standard algorithms for estimating minimum length credible intervals for scalars, there are no such methods for estimating minimum volume credible sets for vectors and functions. We propose a minimum volume covering ellipsoids (MVCE) approach for vector valued parameters, guaranteed to construct credible regions with probability ≥ 1-α, while yielding highest posterior density regions under asymptotic normality. For one-dimensional random curves, our proposed approach starts with a MVCE region evaluated at finitely many knots, and then interpolates between the knots linearly or relying on Lipschitz continuity. For multivariate random surfaces, our approach uses Delaunay triangulations to approximate the credible region. Frequentist coverage properties and computational efficiency compared with frequentist alternatives are assessed through simulation studies.Doctor of Philosoph
    corecore