41 research outputs found
Flexible Bayesian Product Mixture Models for Vector Autoregressions
Bayesian non-parametric methods based on Dirichlet process mixtures have seen
tremendous success in various domains and are appealing in being able to borrow
information by clustering samples that share identical parameters. However,
such methods can face hurdles in heterogeneous settings where objects are
expected to cluster only along a subset of axes or where clusters of samples
share only a subset of identical parameters. We overcome such limitations by
developing a novel class of product of Dirichlet process location-scale
mixtures that enable independent clustering at multiple scales, which result in
varying levels of information sharing across samples. First, we develop the
approach for independent multivariate data. Subsequently we generalize it to
multivariate time-series data under the framework of multi-subject Vector
Autoregressive (VAR) models that is our primary focus, which go beyond
parametric single-subject VAR models. We establish posterior consistency and
develop efficient posterior computation for implementation. Extensive numerical
studies involving VAR models show distinct advantages over competing methods,
in terms of estimation, clustering, and feature selection accuracy. Our resting
state fMRI analysis from the Human Connectome Project reveals biologically
interpretable connectivity differences between distinct intelligence groups,
while another air pollution application illustrates the superior forecasting
accuracy compared to alternate methods
High-dimensional Measurement Error Models for Lipschitz Loss
Recently emerging large-scale biomedical data pose exciting opportunities for
scientific discoveries. However, the ultrahigh dimensionality and
non-negligible measurement errors in the data may create difficulties in
estimation. There are limited methods for high-dimensional covariates with
measurement error, that usually require knowledge of the noise distribution and
focus on linear or generalized linear models. In this work, we develop
high-dimensional measurement error models for a class of Lipschitz loss
functions that encompasses logistic regression, hinge loss and quantile
regression, among others. Our estimator is designed to minimize the norm
among all estimators belonging to suitable feasible sets, without requiring any
knowledge of the noise distribution. Subsequently, we generalize these
estimators to a Lasso analog version that is computationally scalable to higher
dimensions. We derive theoretical guarantees in terms of finite sample
statistical error bounds and sign consistency, even when the dimensionality
increases exponentially with the sample size. Extensive simulation studies
demonstrate superior performance compared to existing methods in classification
and quantile regression problems. An application to a gender classification
task based on brain functional connectivity in the Human Connectome Project
data illustrates improved accuracy under our approach, and the ability to
reliably identify significant brain connections that drive gender differences
Bayesian Nonparametric Methods for Conditional Distributions
In the first paper, we propose a flexible class of priors for density estimation avoiding discrete mixtures, based on random nonlinear functions of a uniform latent variable with an additive residual. Although discrete mixture modeling has formed the backbone of the literature on Bayesian density estimation incorporating covariates, the use of discrete mixtures leads to some well known disadvantages. We propose an alternative class of priors based on random nonlinear functions of a uniform latent variable with an additive residual. The induced prior for the density is shown to have desirable properties including ease of centering on an initial guess for the density, posterior consistency and straightforward computation via Gibbs sampling. In the second paper, we propose a Bayesian variable selection method involving non-parametric residuals, noting that the majority of literature has focused on the parametric counterpart. We generalize methods and asymptotic theory established for mixtures of g-priors to linear regression models with unknown residuals characterized by DP location mixture. We propose a mixture of semiparametric g-priors allowing for straightforward posterior computation via a stochastic search variable selection algorithm. In addition, Bayes factor and variable selection consistency is shown to result under a class of proper priors on g allowing the number of candidate predictors p to increase much faster than sample size n while making sparsity assumption on the true model size. Our third paper is motivated by the fact that although there are standard algorithms for estimating minimum length credible intervals for scalars, there are no such methods for estimating minimum volume credible sets for vectors and functions. We propose a minimum volume covering ellipsoids (MVCE) approach for vector valued parameters, guaranteed to construct credible regions with probability ≥ 1-α, while yielding highest posterior density regions under asymptotic normality. For one-dimensional random curves, our proposed approach starts with a MVCE region evaluated at finitely many knots, and then interpolates between the knots linearly or relying on Lipschitz continuity. For multivariate random surfaces, our approach uses Delaunay triangulations to approximate the credible region. Frequentist coverage properties and computational efficiency compared with frequentist alternatives are assessed through simulation studies.Doctor of Philosoph