When choosing between competing symbolic models for a data set, a human will
naturally prefer the "simpler" expression or the one which more closely
resembles equations previously seen in a similar context. This suggests a
non-uniform prior on functions, which is, however, rarely considered within a
symbolic regression (SR) framework. In this paper we develop methods to
incorporate detailed prior information on both functions and their parameters
into SR. Our prior on the structure of a function is based on a n-gram
language model, which is sensitive to the arrangement of operators relative to
one another in addition to the frequency of occurrence of each operator. We
also develop a formalism based on the Fractional Bayes Factor to treat
numerical parameter priors in such a way that models may be fairly compared
though the Bayesian evidence, and explicitly compare Bayesian, Minimum
Description Length and heuristic methods for model selection. We demonstrate
the performance of our priors relative to literature standards on benchmarks
and a real-world dataset from the field of cosmology.Comment: 8+2 pages, 2 figures. Submitted to The Genetic and Evolutionary
Computation Conference (GECCO) 2023 Workshop on Symbolic Regressio