Phylogenetic mixture models, in which the sites in sequences undergo
different substitution processes along the same or different trees, allow the
description of heterogeneous evolutionary processes. As data sets consisting of
longer sequences become available, it is important to understand such models,
for both theoretical insights and use in statistical analyses. Some recent
articles have highlighted disturbing "mimicking" behavior in which a
distribution from a mixture model is identical to one arising on a different
tree or trees. Other works have indicated such problems are unlikely to occur
in practice, as they require very special parameter choices.
After surveying some of these works on mixture models, we give several new
results. In general, if the number of components in a generating mixture is not
too large and we disallow zero or infinite branch lengths, then it cannot mimic
the behavior of a non-mixture on a different tree. On the other hand, if the
mixture model is locally over-parameterized, it is possible for a phylogenetic
mixture model to mimic distributions of another tree model. Though theoretical
questions remain, these sorts of results can serve as a guide to when the use
of mixture models in either ML or Bayesian frameworks is likely to lead to
statistically consistent inference, and when mimicking due to heterogeneity
should be considered a realistic possibility.Comment: 21 pages, 1 figure; revised to expand commentary; Mittag-Leffler
Institute, Spring 201