153,774 research outputs found
Optimal predictive model selection
Often the goal of model selection is to choose a model for future prediction,
and it is natural to measure the accuracy of a future prediction by squared
error loss. Under the Bayesian approach, it is commonly perceived that the
optimal predictive model is the model with highest posterior probability, but
this is not necessarily the case. In this paper we show that, for selection
among normal linear models, the optimal predictive model is often the median
probability model, which is defined as the model consisting of those variables
which have overall posterior probability greater than or equal to 1/2 of being
in a model. The median probability model often differs from the highest
probability model
Model selection in cosmology
Model selection aims to determine which theoretical models are most plausible given some data, without necessarily considering preferred values of model parameters. A common model selection question is to ask when new data require introduction of an additional parameter, describing a newly discovered physical effect. We review model selection statistics, then focus on the Bayesian evidence, which implements Bayesian analysis at the level of models rather than parameters. We describe our CosmoNest code, the first computationally efficient implementation of Bayesian model selection in a cosmological context. We apply it to recent WMAP satellite data, examining the need for a perturbation spectral index differing from the scaleinvariant (Harrison–Zel'dovich) case
The Model Selection Curse
A "statistician" takes an action on behalf of an agent, based on the agent's
self-reported personal data and a sample involving other people. The action
that he takes is an estimated function of the agent's report. The estimation
procedure involves model selection. We ask the following question: Is
truth-telling optimal for the agent given the statistician's procedure? We
analyze this question in the context of a simple example that highlights the
role of model selection. We suggest that our simple exercise may have
implications for the broader issue of human interaction with "machine learning"
algorithms
Model selection and local geometry
We consider problems in model selection caused by the geometry of models
close to their points of intersection. In some cases---including common classes
of causal or graphical models, as well as time series models---distinct models
may nevertheless have identical tangent spaces. This has two immediate
consequences: first, in order to obtain constant power to reject one model in
favour of another we need local alternative hypotheses that decrease to the
null at a slower rate than the usual parametric (typically we will
require or slower); in other words, to distinguish between the
models we need large effect sizes or very large sample sizes. Second, we show
that under even weaker conditions on their tangent cones, models in these
classes cannot be made simultaneously convex by a reparameterization.
This shows that Bayesian network models, amongst others, cannot be learned
directly with a convex method similar to the graphical lasso. However, we are
able to use our results to suggest methods for model selection that learn the
tangent space directly, rather than the model itself. In particular, we give a
generic algorithm for learning Bayesian network models
Bootstrap for neural model selection
Bootstrap techniques (also called resampling computation techniques) have
introduced new advances in modeling and model evaluation. Using resampling
methods to construct a series of new samples which are based on the original
data set, allows to estimate the stability of the parameters. Properties such
as convergence and asymptotic normality can be checked for any particular
observed data set. In most cases, the statistics computed on the generated data
sets give a good idea of the confidence regions of the estimates. In this
paper, we debate on the contribution of such methods for model selection, in
the case of feedforward neural networks. The method is described and compared
with the leave-one-out resampling method. The effectiveness of the bootstrap
method, versus the leave-one-out methode, is checked through a number of
examples.Comment: A la suite de la conf\'{e}rence ESANN 200
Recommended from our members
Model Selection in Threshold Models
This paper considers information criteria as model evaluation tools for nonlinear threshold models. Results concerning the consistency of information criteria in selecting the lag order of linear autoregressive models are extended to nonlinear autoregressive threshold models. Extensive Monte Carlo evidence of the small sample performance of a number of criteria is presented
Model selection for amplitude analysis
Model complexity in amplitude analyses is often a priori under-constrained
since the underlying theory permits a large number of possible amplitudes to
contribute to most physical processes. The use of an overly complex model
results in reduced predictive power and worse resolution on unknown parameters
of interest. Therefore, it is common to reduce the complexity by removing from
consideration some subset of the allowed amplitudes. This paper studies a
method for limiting model complexity from the data sample itself through
regularization during regression in the context of a multivariate (Dalitz-plot)
analysis. The regularization technique applied greatly improves the
performance. An outline of how to obtain the significance of a resonance in a
multivariate amplitude analysis is also provided
- …