102 research outputs found
Adaptive Bayesian Predictive Inference
Bayesian predictive inference provides a coherent description of entire
predictive uncertainty through predictive distributions. We examine several
widely used sparsity priors from the predictive (as opposed to estimation)
inference viewpoint. Our context is estimating a predictive distribution of a
high-dimensional Gaussian observation with a known variance but an unknown
sparse mean under the Kullback-Leibler loss. First, we show that LASSO
(Laplace) priors are incapable of achieving rate-optimal performance. This new
result contributes to the literature on negative findings about Bayesian LASSO
posteriors. However, deploying the Laplace prior inside the Spike-and-Slab
framework (for example with the Spike-and-Slab LASSO prior), rate-minimax
performance can be attained with properly tuned parameters (depending on the
sparsity level sn). We highlight the discrepancy between prior calibration for
the purpose of prediction and estimation. Going further, we investigate popular
hierarchical priors which are known to attain adaptive rate-minimax performance
for estimation. Whether or not they are rate-minimax also for predictive
inference has, until now, been unclear. We answer affirmatively by showing that
hierarchical Spike-and-Slab priors are adaptive and attain the minimax rate
without the knowledge of sn. This is the first rate-adaptive result in the
literature on predictive density estimation in sparse setups. This finding
celebrates benefits of fully Bayesian inference
Simultaneous Variable and Covariance Selection with the Multivariate Spike-and-Slab Lasso
We propose a Bayesian procedure for simultaneous variable and covariance
selection using continuous spike-and-slab priors in multivariate linear
regression models where q possibly correlated responses are regressed onto p
predictors. Rather than relying on a stochastic search through the
high-dimensional model space, we develop an ECM algorithm similar to the EMVS
procedure of Rockova & George (2014) targeting modal estimates of the matrix of
regression coefficients and residual precision matrix. Varying the scale of the
continuous spike densities facilitates dynamic posterior exploration and allows
us to filter out negligible regression coefficients and partial covariances
gradually. Our method is seen to substantially outperform regularization
competitors on simulated data. We demonstrate our method with a re-examination
of data from a recent observational study of the effect of playing high school
football on several later-life cognition, psychological, and socio-economic
outcomes
The Median Probability Model and Correlated Variables
The median probability model (MPM) Barbieri and Berger (2004) is defined as
the model consisting of those variables whose marginal posterior probability of
inclusion is at least 0.5. The MPM rule yields the best single model for
prediction in orthogonal and nested correlated designs. This result was
originally conceived under a specific class of priors, such as the point mass
mixtures of non-informative and g-type priors. The MPM rule, however, has
become so very popular that it is now being deployed for a wider variety of
priors and under correlated designs, where the properties of MPM are not yet
completely understood. The main thrust of this work is to shed light on
properties of MPM in these contexts by (a) characterizing situations when MPM
is still safe under correlated designs, (b) providing significant
generalizations of MPM to a broader class of priors (such as continuous
spike-and-slab priors). We also provide new supporting evidence for the
suitability of g-priors, as opposed to independent product priors, using new
predictive matching arguments. Furthermore, we emphasize the importance of
prior model probabilities and highlight the merits of non-uniform prior
probability assignments using the notion of model aggregates
The art of BART: On flexibility of Bayesian forests
Considerable effort has been directed to developing asymptotically minimax
procedures in problems of recovering functions and densities. These methods
often rely on somewhat arbitrary and restrictive assumptions such as isotropy
or spatial homogeneity. This work enhances theoretical understanding of
Bayesian forests (including BART) under substantially relaxed smoothness
assumptions. In particular, we provide a comprehensive study of asymptotic
optimality and posterior contraction of Bayesian forests when the regression
function has anisotropic smoothness that possibly varies over the function
domain. We introduce a new class of sparse piecewise heterogeneous anisotropic
H\"{o}lder functions and derive their minimax rate of estimation in
high-dimensional scenarios under the loss. Next, we find that the default
Bayesian CART prior, coupled with a subset selection prior for sparse
estimation in high-dimensional scenarios, adapts to unknown heterogeneous
smoothness and sparsity. These results show that Bayesian forests are uniquely
suited for more general estimation problems which would render other default
machine learning tools, such as Gaussian processes, suboptimal. Beyond
nonparametric regression, we also show that Bayesian forests can be
successfully applied to many other problems including density estimation and
binary classification
- …