The Bayesian Context Trees (BCT) framework is a recently introduced, general
collection of statistical and algorithmic tools for modelling, analysis and
inference with discrete-valued time series. The foundation of this development
is built in part on some well-known information-theoretic ideas and techniques,
including Rissanen's tree sources and Willems et al.'s context-tree weighting
algorithm. This paper presents a collection of theoretical results that provide
mathematical justifications and further insight into the BCT modelling
framework and the associated practical tools. It is shown that the BCT prior
predictive likelihood (the probability of a time series of observations
averaged over all models and parameters) is both pointwise and minimax optimal,
in agreement with the MDL principle and the BIC criterion. The posterior
distribution is shown to be asymptotically consistent with probability one
(over both models and parameters), and asymptotically Gaussian (over the
parameters). And the posterior predictive distribution is also shown to be
asymptotically consistent with probability one