We discuss the use of the Bayesian evidence ratio, or Bayes factor, for model
selection in astronomy. We treat the evidence ratio as a statistic and
investigate its distribution over an ensemble of experiments, considering both
simple analytical examples and some more realistic cases, which require
numerical simulation. We find that the evidence ratio is a noisy statistic, and
thus it may not be sensible to decide to accept or reject a model based solely
on whether the evidence ratio reaches some threshold value. The odds suggested
by the evidence ratio bear no obvious relationship to the power or Type I error
rate of a test based on the evidence ratio. The general performance of such
tests is strongly affected by the signal to noise ratio in the data, the
assumed priors, and the threshold in the evidence ratio that is taken as
`decisive'. The comprehensiveness of the model suite under consideration is
also very important. The usefulness of the evidence ratio approach in a given
problem can be assessed in advance of the experiment, using simple models and
numerical approximations. In many cases, this approach can be as informative as
a much more costly full-scale Bayesian analysis of a complex problem.Comment: 11 pages; MNRAS in pres