27 research outputs found
Adaptive Bayesian Predictive Inference
Bayesian predictive inference provides a coherent description of entire
predictive uncertainty through predictive distributions. We examine several
widely used sparsity priors from the predictive (as opposed to estimation)
inference viewpoint. Our context is estimating a predictive distribution of a
high-dimensional Gaussian observation with a known variance but an unknown
sparse mean under the Kullback-Leibler loss. First, we show that LASSO
(Laplace) priors are incapable of achieving rate-optimal performance. This new
result contributes to the literature on negative findings about Bayesian LASSO
posteriors. However, deploying the Laplace prior inside the Spike-and-Slab
framework (for example with the Spike-and-Slab LASSO prior), rate-minimax
performance can be attained with properly tuned parameters (depending on the
sparsity level sn). We highlight the discrepancy between prior calibration for
the purpose of prediction and estimation. Going further, we investigate popular
hierarchical priors which are known to attain adaptive rate-minimax performance
for estimation. Whether or not they are rate-minimax also for predictive
inference has, until now, been unclear. We answer affirmatively by showing that
hierarchical Spike-and-Slab priors are adaptive and attain the minimax rate
without the knowledge of sn. This is the first rate-adaptive result in the
literature on predictive density estimation in sparse setups. This finding
celebrates benefits of fully Bayesian inference
The art of BART: On flexibility of Bayesian forests
Considerable effort has been directed to developing asymptotically minimax
procedures in problems of recovering functions and densities. These methods
often rely on somewhat arbitrary and restrictive assumptions such as isotropy
or spatial homogeneity. This work enhances theoretical understanding of
Bayesian forests (including BART) under substantially relaxed smoothness
assumptions. In particular, we provide a comprehensive study of asymptotic
optimality and posterior contraction of Bayesian forests when the regression
function has anisotropic smoothness that possibly varies over the function
domain. We introduce a new class of sparse piecewise heterogeneous anisotropic
H\"{o}lder functions and derive their minimax rate of estimation in
high-dimensional scenarios under the loss. Next, we find that the default
Bayesian CART prior, coupled with a subset selection prior for sparse
estimation in high-dimensional scenarios, adapts to unknown heterogeneous
smoothness and sparsity. These results show that Bayesian forests are uniquely
suited for more general estimation problems which would render other default
machine learning tools, such as Gaussian processes, suboptimal. Beyond
nonparametric regression, we also show that Bayesian forests can be
successfully applied to many other problems including density estimation and
binary classification
On Mixing Rates for Bayesian CART
The success of Bayesian inference with MCMC depends critically on Markov
chains rapidly reaching the posterior distribution. Despite the plentitude of
inferential theory for posteriors in Bayesian non-parametrics, convergence
properties of MCMC algorithms that simulate from such ideal inferential targets
are not thoroughly understood. This work focuses on the Bayesian CART algorithm
which forms a building block of Bayesian Additive Regression Trees (BART). We
derive upper bounds on mixing times for typical posteriors under various
proposal distributions. Exploiting the wavelet representation of trees, we
provide sufficient conditions for Bayesian CART to mix well (polynomially)
under certain hierarchical connectivity restrictions on the signal. We also
derive a negative result showing that Bayesian CART (based on simple grow and
prune steps) cannot reach deep isolated signals in faster than exponential
mixing time. To remediate myopic tree exploration, we propose Twiggy Bayesian
CART which attaches/detaches entire twigs (not just single nodes) in the
proposal distribution. We show polynomial mixing of Twiggy Bayesian CART
without assuming that the signal is connected on a tree. Going further, we show
that informed variants achieve even faster mixing. A thorough simulation study
highlights discrepancies between spike-and-slab priors and Bayesian CART under
a variety of proposals
The Median Probability Model and Correlated Variables
The median probability model (MPM) Barbieri and Berger (2004) is defined as
the model consisting of those variables whose marginal posterior probability of
inclusion is at least 0.5. The MPM rule yields the best single model for
prediction in orthogonal and nested correlated designs. This result was
originally conceived under a specific class of priors, such as the point mass
mixtures of non-informative and g-type priors. The MPM rule, however, has
become so very popular that it is now being deployed for a wider variety of
priors and under correlated designs, where the properties of MPM are not yet
completely understood. The main thrust of this work is to shed light on
properties of MPM in these contexts by (a) characterizing situations when MPM
is still safe under correlated designs, (b) providing significant
generalizations of MPM to a broader class of priors (such as continuous
spike-and-slab priors). We also provide new supporting evidence for the
suitability of g-priors, as opposed to independent product priors, using new
predictive matching arguments. Furthermore, we emphasize the importance of
prior model probabilities and highlight the merits of non-uniform prior
probability assignments using the notion of model aggregates