252 research outputs found
Projection predictive model selection for Gaussian processes
We propose a new method for simplification of Gaussian process (GP) models by
projecting the information contained in the full encompassing model and
selecting a reduced number of variables based on their predictive relevance.
Our results on synthetic and real world datasets show that the proposed method
improves the assessment of variable relevance compared to the automatic
relevance determination (ARD) via the length-scale parameters. We expect the
method to be useful for improving explainability of the models, reducing the
future measurement costs and reducing the computation time for making new
predictions.Comment: A few minor changes in tex
Approximate Inference for Nonstationary Heteroscedastic Gaussian process Regression
This paper presents a novel approach for approximate integration over the
uncertainty of noise and signal variances in Gaussian process (GP) regression.
Our efficient and straightforward approach can also be applied to integration
over input dependent noise variance (heteroscedasticity) and input dependent
signal variance (nonstationarity) by setting independent GP priors for the
noise and signal variances. We use expectation propagation (EP) for inference
and compare results to Markov chain Monte Carlo in two simulated data sets and
three empirical examples. The results show that EP produces comparable results
with less computational burden
Efficient estimation and correction of selection-induced bias with order statistics
Model selection aims to identify a sufficiently well performing model that is
possibly simpler than the most complex model among a pool of candidates.
However, the decision-making process itself can inadvertently introduce
non-negligible bias when the cross-validation estimates of predictive
performance are marred by excessive noise. In finite data regimes,
cross-validated estimates can encourage the statistician to select one model
over another when it is not actually better for future data. While this bias
remains negligible in the case of few models, when the pool of candidates
grows, and model selection decisions are compounded (as in forward search), the
expected magnitude of selection-induced bias is likely to grow too. This paper
introduces an efficient approach to estimate and correct selection-induced bias
based on order statistics. Numerical experiments demonstrate the reliability of
our approach in estimating both selection-induced bias and over-fitting along
compounded model selection decisions, with specific application to forward
search. This work represents a light-weight alternative to more computationally
expensive approaches to correcting selection-induced bias, such as nested
cross-validation and the bootstrap. Our approach rests on several theoretic
assumptions, and we provide a diagnostic to help understand when these may not
be valid and when to fall back on safer, albeit more computationally expensive
approaches. The accompanying code facilitates its practical implementation and
fosters further exploration in this area.Comment: 20 (+6) pages; 8 (+4) figure
Bayesian leave-one-out cross-validation for large data
Model inference, such as model comparison, model checking, and model
selection, is an important part of model development. Leave-one-out
cross-validation (LOO) is a general approach for assessing the generalizability
of a model, but unfortunately, LOO does not scale well to large datasets. We
propose a combination of using approximate inference techniques and
probability-proportional-to-size-sampling (PPS) for fast LOO model evaluation
for large datasets. We provide both theoretical and empirical results showing
good properties for large data.Comment: Accepted to ICML 2019. This version is the submitted pape
- …