12 research outputs found
Estimating Shapley effects for moderate-to-large input dimensions
Sobol' indices and Shapley effects are attractive methods of assessing how a
function depends on its various inputs. The existing literature contains
various estimators for these two classes of sensitivity indices, but few
estimators of Sobol' indices and no estimators of Shapley effects are
computationally tractable for moderate-to-large input dimensions. This article
provides a Shapley-effect estimator that is computationally tractable for a
moderate-to-large input dimension. The estimator uses a metamodel-based
approach by first fitting a Bayesian Additive Regression Trees model which is
then used to compute Shapley-effect estimates. This article also establishes
posterior contraction rates on a large function class for this Shapley-effect
estimator and for the analogous existing Sobol'-index estimator. Finally, this
paper explores the performance of these Shapley-effect estimators on four
different test functions for moderate-to-large input dimensions and number of
observations.Comment: 19 pages, 3 figure
Sharded Bayesian Additive Regression Trees
In this paper we develop the randomized Sharded Bayesian Additive Regression
Trees (SBT) model. We introduce a randomization auxiliary variable and a
sharding tree to decide partitioning of data, and fit each partition component
to a sub-model using Bayesian Additive Regression Tree (BART). By observing
that the optimal design of a sharding tree can determine optimal sharding for
sub-models on a product space, we introduce an intersection tree structure to
completely specify both the sharding and modeling using only tree structures.
In addition to experiments, we also derive the theoretical optimal weights for
minimizing posterior contractions and prove the worst-case complexity of SBT.Comment: 46 pages, 10 figures (Appendix included
Model Mixing Using Bayesian Additive Regression Trees
In modern computer experiment applications, one often encounters the
situation where various models of a physical system are considered, each
implemented as a simulator on a computer. An important question in such a
setting is determining the best simulator, or the best combination of
simulators, to use for prediction and inference. Bayesian model averaging (BMA)
and stacking are two statistical approaches used to account for model
uncertainty by aggregating a set of predictions through a simple linear
combination or weighted average. Bayesian model mixing (BMM) extends these
ideas to capture the localized behavior of each simulator by defining
input-dependent weights. One possibility is to define the relationship between
inputs and the weight functions using a flexible non-parametric model that
learns the local strengths and weaknesses of each simulator. This paper
proposes a BMM model based on Bayesian Additive Regression Trees (BART). The
proposed methodology is applied to combine predictions from Effective Field
Theories (EFTs) associated with a motivating nuclear physics application.Comment: 33 pages, 6 figures, additional supplementary material can be found
at https://github.com/jcyannotty/OpenB
Design and Analysis of Experiments on Nonconvex Regions
<p>Modeling a response over a nonconvex design region is a common problem in diverse areas such as engineering and geophysics. The tools available to model and design for such responses are limited and have received little attention. We propose a new method for selecting design points over nonconvex regions that is based on the application of multidimensional scaling to the geodesic distance. Optimal designs for prediction are described, with special emphasis on Gaussian process models, followed by a simulation study and an application in glaciology. Supplementary materials for this article are available online.</p