1,145 research outputs found
Comparing Bayes Model Averaging and Stacking When Model Approximation Error Cannot be Ignored
We compare Bayes Model Averaging, BMA, to a non-Bayes form of model averaging called stacking. In stacking, the weights are no longer posterior probabilities of models; they are obtained by a technique based on cross-validation. When the correct data generating model (DGM) is on the list of models under consideration BMA is never worse than stacking and often is demonstrably better, provided that the noise level is of order commensurate with the coefficients and explanatory variables. Here, however, we focus on the case that the correct DGM is not on the model list and may not be well approximated by the elements on the model list. We give a sequence of computed examples by choosing model lists and DGM’s to contrast the risk performance of stacking and BMA. In the first examples, the model lists are chosen to reflect geometric principles that should give good performance. In these cases, stacking typically outperforms BMA, sometimes by a wide margin. In the second set of examples we examine how stacking and BMA perform when the model list includes all subsets of a set of potential predictors. When we standardize the size of terms and coefficients in this setting, we find that BMA outperforms stacking when the deviant terms in the DGM ‘point’ in directions accommodated by the model list but that when the deviant term points outside the model list stacking seems to do better. Overall, our results suggest the stacking has better robustness properties than BMA in the most important settings
A Bayes interpretation of stacking for M-complete and M-open settings
In M-open problems where no true model can be conceptualized, it is common to
back off from modeling and merely seek good prediction. Even in M-complete
problems, taking a predictive approach can be very useful. Stacking is a model
averaging procedure that gives a composite predictor by combining individual
predictors from a list of models using weights that optimize a cross-validation
criterion. We show that the stacking weights also asymptotically minimize a
posterior expected loss. Hence we formally provide a Bayesian justification for
cross-validation. Often the weights are constrained to be positive and sum to
one. For greater generality, we omit the positivity constraint and relax the
`sum to one' constraint.
A key question is `What predictors should be in the average?' We first verify
that the stacking error depends only on the span of the models. Then we propose
using bootstrap samples from the data to generate empirical basis elements that
can be used to form models. We use this in two computed examples to give
stacking predictors that are (i) data driven, (ii) optimal with respect to the
number of component predictors, and (iii) optimal with respect to the weight
each predictor gets.Comment: 37 pages, 2 figure
Using the Bayesian Shtarkov solution for predictions
AbstractThe Bayes Shtarkov predictor can be defined and used for a variety of data sets that are exceedingly hard if not impossible to model in any detailed fashion. Indeed, this is the setting in which the derivation of the Shtarkov solution is most compelling. The computations show that anytime the numerical approximation to the Shtarkov solution is ‘reasonable’, it is better in terms of predictive error than a variety of other general predictive procedures. These include two forms of additive model as well as bagging or stacking with support vector machines, Nadaraya–Watson estimators, or draws from a Gaussian Process Prior
Desiderata for a Predictive Theory of Statistics
In many contexts the predictive validation of models or their associated prediction strategies is of greater importance than model identification which may be practically impossible. This is particularly so in fields involving complex or high dimensional data where model selection, or more generally predictor selection is the main focus of effort. This paper suggests a unified treatment for predictive analyses based on six \u27desiderata\u27. These desiderata are an effort to clarify what criteria a good predictive theory of statistics should satisfy
Bayesian hierarchical stacking: Some models are (somewhere) useful
Stacking is a widely used model averaging technique that asymptotically
yields optimal predictions among linear averages. We show that stacking is most
effective when model predictive performance is heterogeneous in inputs, and we
can further improve the stacked mixture with a hierarchical model. We
generalize stacking to Bayesian hierarchical stacking. The model weights are
varying as a function of data, partially-pooled, and inferred using Bayesian
inference. We further incorporate discrete and continuous inputs, other
structured priors, and time series and longitudinal data. To verify the
performance gain of the proposed method, we derive theory bounds, and
demonstrate on several applied problems.Comment: minor revisio
Robust Bayesian Linear Classifier Ensembles
The original publication is available at
http://www.springerlink.comEnsemble classifiers combine the classification results of several classifiers.
Simple ensemble methods such as uniform averaging over a set of models
usually provide an improvement over selecting the single best model. Usually probabilistic
classifiers restrict the set of possible models that can be learnt in order to
lower computational complexity costs. In these restricted spaces, where incorrect
modelling assumptions are possibly made, uniform averaging sometimes performs
even better than bayesian model averaging. Linear mixtures over sets of models provide
an space that includes uniform averaging as a particular case. We develop two
algorithms for learning maximum a posteriori weights for linear mixtures, based on
expectation maximization and on constrained optimization. We provide a nontrivial
example of the utility of these two algorithms by applying them for one dependence
estimators.We develop the conjugate distribution for one dependence estimators and
empirically show that uniform averaging is clearly superior to BMA for this family
of models. After that we empirically show that the maximum a posteriori linear mixture
weights improve accuracy significantly over uniform aggregation.Peer reviewe
A Cheat Sheet for Bayesian Prediction
This paper reviews the growing field of Bayesian prediction. Bayes point and
interval prediction are defined and exemplified and situated in statistical
prediction more generally. Then, four general approaches to Bayes prediction
are defined
and we turn to predictor selection. This can be done predictively or
non-predictively and predictors can be based on single models or multiple
models. We call these latter cases unitary predictors and model average
predictors, respectively. Then we turn to the most recent aspect of prediction
to emerge, namely
prediction in the context of large observational data sets and discuss three
further classes of techniques. We conclude with a summary and statement of
several current open problems.Comment: 33 page
Bayesian comparison of latent variable models: Conditional vs marginal likelihoods
Typical Bayesian methods for models with latent variables (or random effects)
involve directly sampling the latent variables along with the model parameters.
In high-level software code for model definitions (using, e.g., BUGS, JAGS,
Stan), the likelihood is therefore specified as conditional on the latent
variables. This can lead researchers to perform model comparisons via
conditional likelihoods, where the latent variables are considered model
parameters. In other settings, however, typical model comparisons involve
marginal likelihoods where the latent variables are integrated out. This
distinction is often overlooked despite the fact that it can have a large
impact on the comparisons of interest. In this paper, we clarify and illustrate
these issues, focusing on the comparison of conditional and marginal Deviance
Information Criteria (DICs) and Watanabe-Akaike Information Criteria (WAICs) in
psychometric modeling. The conditional/marginal distinction corresponds to
whether the model should be predictive for the clusters that are in the data or
for new clusters (where "clusters" typically correspond to higher-level units
like people or schools). Correspondingly, we show that marginal WAIC
corresponds to leave-one-cluster out (LOcO) cross-validation, whereas
conditional WAIC corresponds to leave-one-unit out (LOuO). These results lead
to recommendations on the general application of the criteria to models with
latent variables.Comment: Manuscript in press at Psychometrika; 31 pages, 8 figure
Improved prediction accuracy for disease risk mapping using Gaussian process stacked generalization.
Maps of infectious disease-charting spatial variations in the force of infection, degree of endemicity and the burden on human health-provide an essential evidence base to support planning towards global health targets. Contemporary disease mapping efforts have embraced statistical modelling approaches to properly acknowledge uncertainties in both the available measurements and their spatial interpolation. The most common such approach is Gaussian process regression, a mathematical framework composed of two components: a mean function harnessing the predictive power of multiple independent variables, and a covariance function yielding spatio-temporal shrinkage against residual variation from the mean. Though many techniques have been developed to improve the flexibility and fitting of the covariance function, models for the mean function have typically been restricted to simple linear terms. For infectious diseases, known to be driven by complex interactions between environmental and socio-economic factors, improved modelling of the mean function can greatly boost predictive power. Here, we present an ensemble approach based on stacked generalization that allows for multiple nonlinear algorithmic mean functions to be jointly embedded within the Gaussian process framework. We apply this method to mapping Plasmodium falciparum prevalence data in sub-Saharan Africa and show that the generalized ensemble approach markedly outperforms any individual method
- …