2,750 research outputs found
Identifiability of Large Phylogenetic Mixture Models
Phylogenetic mixture models are statistical models of character evolution
allowing for heterogeneity. Each of the classes in some unknown partition of
the characters may evolve by different processes, or even along different
trees. The fundamental question of whether parameters of such a model are
identifiable is difficult to address, due to the complexity of the
parameterization. We analyze mixture models on large trees, with many mixture
components, showing that both numerical and tree parameters are indeed
identifiable in these models when all trees are the same. We also explore the
extent to which our algebraic techniques can be employed to extend the result
to mixtures on different trees.Comment: 15 page
The identifiability of tree topology for phylogenetic models, including covarion and mixture models
For a model of molecular evolution to be useful for phylogenetic inference,
the topology of evolutionary trees must be identifiable. That is, from a joint
distribution the model predicts, it must be possible to recover the tree
parameter. We establish tree identifiability for a number of phylogenetic
models, including a covarion model and a variety of mixture models with a
limited number of classes. The proof is based on the introduction of a more
general model, allowing more states at internal nodes of the tree than at
leaves, and the study of the algebraic variety formed by the joint
distributions to which it gives rise. Tree identifiability is first established
for this general model through the use of certain phylogenetic invariants.Comment: 20 pages, 1 figur
Identifying evolutionary trees and substitution parameters for the general Markov model with invariable sites
The general Markov plus invariable sites (GM+I) model of biological sequence
evolution is a two-class model in which an unknown proportion of sites are not
allowed to change, while the remainder undergo substitutions according to a
Markov process on a tree. For statistical use it is important to know if the
model is identifiable; can both the tree topology and the numerical parameters
be determined from a joint distribution describing sequences only at the leaves
of the tree? We establish that for generic parameters both the tree and all
numerical parameter values can be recovered, up to clearly understood issues of
`label swapping.' The method of analysis is algebraic, using phylogenetic
invariants to study the variety defined by the model. Simple rational formulas,
expressed in terms of determinantal ratios, are found for recovering numerical
parameters describing the invariable sites
When Do Phylogenetic Mixture Models Mimic Other Phylogenetic Models?
Phylogenetic mixture models, in which the sites in sequences undergo
different substitution processes along the same or different trees, allow the
description of heterogeneous evolutionary processes. As data sets consisting of
longer sequences become available, it is important to understand such models,
for both theoretical insights and use in statistical analyses. Some recent
articles have highlighted disturbing "mimicking" behavior in which a
distribution from a mixture model is identical to one arising on a different
tree or trees. Other works have indicated such problems are unlikely to occur
in practice, as they require very special parameter choices.
After surveying some of these works on mixture models, we give several new
results. In general, if the number of components in a generating mixture is not
too large and we disallow zero or infinite branch lengths, then it cannot mimic
the behavior of a non-mixture on a different tree. On the other hand, if the
mixture model is locally over-parameterized, it is possible for a phylogenetic
mixture model to mimic distributions of another tree model. Though theoretical
questions remain, these sorts of results can serve as a guide to when the use
of mixture models in either ML or Bayesian frameworks is likely to lead to
statistically consistent inference, and when mimicking due to heterogeneity
should be considered a realistic possibility.Comment: 21 pages, 1 figure; revised to expand commentary; Mittag-Leffler
Institute, Spring 201
There are no caterpillars in a wicked forest
Species trees represent the historical divergences of populations or species,
while gene trees trace the ancestry of individual gene copies sampled within
those populations. In cases involving rapid speciation, gene trees with
topologies that differ from that of the species tree can be most probable under
the standard multispecies coalescent model, making species tree inference more
difficult. Such anomalous gene trees are not well understood except for some
small cases. In this work, we establish one constraint that applies to trees of
any size: gene trees with "caterpillar" topologies cannot be anomalous. The
proof of this involves a new combinatorial object, called a population history,
which keeps track of the number of coalescent events in each ancestral
population.Comment: 16 pages, 4 figure
Hypothesis testing near singularities and boundaries
The likelihood ratio statistic, with its asymptotic distribution at
regular model points, is often used for hypothesis testing. At model
singularities and boundaries, however, the asymptotic distribution may not be
, as highlighted by recent work of Drton. Indeed, poor behavior of a
for testing near singularities and boundaries is apparent in
simulations, and can lead to conservative or anti-conservative tests. Here we
develop a new distribution designed for use in hypothesis testing near
singularities and boundaries, which asymptotically agrees with that of the
likelihood ratio statistic. For two example trinomial models, arising in the
context of inference of evolutionary trees, we show the new distributions
outperform a .Comment: 32 pages, 12 figure
- …
