5,824 research outputs found
Dimensional reduction for the general Markov model on phylogenetic trees
We present a method of dimensional reduction for the general Markov model of
sequence evolution on a phylogenetic tree. We show that taking certain linear
combinations of the associated random variables (site pattern counts) reduces
the dimensionality of the model from exponential in the number of extant taxa,
to quadratic in the number of taxa, while retaining the ability to
statistically identify phylogenetic divergence events. A key feature is the
identification of an invariant subspace which depends only bilinearly on the
model parameters, in contrast to the usual multi-linear dependence in the full
space. We discuss potential applications including the computation of split
(edge) weights on phylogenetic trees from observed sequence data.Comment: 17 pages, 3 figures. v4: Substantial revision. Additional motivations
for transformation rules and more details in proofs of the main results are
provide
Lie geometry of 2x2 Markov matrices
In recent work discussing model choice for continuous-time Markov chains, we
have argued that it is important that the Markov matrices that define the model
are closed under matrix multiplication (Sumner 2012a, 2012b). The primary
requirement is then that the associated set of rate matrices form a Lie
algebra. For the generic case, this connection to Lie theory seems to have
first been made by Johnson (1985), with applications for specific models given
in Bashford (2004) and House (2012). Here we take a different perspective:
given a model that forms a Lie algebra, we apply existing Lie theory to gain
additional insight into the geometry of the associated Markov matrices. In this
short note, we present the simplest case possible of 2x2 Markov matrices. The
main result is a novel decomposition of 2x2 Markov matrices that parameterises
the general Markov model as a perturbation away from the binary-symmetric
model. This alternative parameterisation provides a useful tool for visualising
the binary-symmetric model as a submodel of the general Markov model.Comment: 5 pages, 2 figure
Markov invariants for phylogenetic rate matrices derived from embedded submodels
We consider novel phylogenetic models with rate matrices that arise via the
embedding of a progenitor model on a small number of character states, into a
target model on a larger number of character states. Adapting
representation-theoretic results from recent investigations of Markov
invariants for the general rate matrix model, we give a prescription for
identifying and counting Markov invariants for such `symmetric embedded'
models, and we provide enumerations of these for low-dimensional cases. The
simplest example is a target model on 3 states, constructed from a general 2
state model; the `2->3' embedding. We show that for 2 taxa, there exist two
invariants of quadratic degree, that can be used to directly infer pairwise
distances from observed sequences under this model. A simple simulation study
verifies their theoretical expected values, and suggests that, given the
appropriateness of the model class, they have greater statistical power than
the standard (log) Det invariant (which is of cubic degree for this case).Comment: 16 pages, 1 figure, 1 appendi
Lie-Markov models derived from finite semigroups
We present and explore a general method for deriving a Lie-Markov model from
a finite semigroup. If the degree of the semigroup is , the resulting model
is a continuous-time Markov chain on states and, as a consequence of the
product rule in the semigroup, satisfies the property of multiplicative
closure. This means that the product of any two probability substitution
matrices taken from the model produces another substitution matrix also in the
model. We show that our construction is a natural generalization of the concept
of group-based models.Comment: 17 page
Entanglement Invariants and Phylogenetic Branching
It is possible to consider stochastic models of sequence evolution in
phylogenetics in the context of a dynamical tensor description inspired from
physics. Approaching the problem in this framework allows for the well
developed methods of mathematical physics to be exploited in the biological
arena. We present the tensor description of the homogeneous continuous time
Markov chain model of phylogenetics with branching events generated by
dynamical operations. Standard results from phylogenetics are shown to be
derivable from the tensor framework. We summarize a powerful approach to
entanglement measures in quantum physics and present its relevance to
phylogenetic analysis. Entanglement measures are found to give distance
measures that are equivalent to, and expand upon, those already known in
phylogenetics. In particular we make the connection between the group invariant
functions of phylogenetic data and phylogenetic distance functions. We
introduce a new distance measure valid for three taxa based on the group
invariant function known in physics as the "tangle". All work is presented for
the homogeneous continuous time Markov chain model with arbitrary rate
matrices.Comment: 21 pages, 3 Figures. Accepted for publication in Journal of
Mathematical Biolog
The impracticalities of multiplicatively-closed codon models: a retreat to linear alternatives
A matrix Lie algebra is a linear space of matrices closed under the operation
. The "Lie closure" of a set of matrices is the smallest
matrix Lie algebra which contains the set. In the context of Markov chain
theory, if a set of rate matrices form a Lie algebra, their corresponding
Markov matrices are closed under matrix multiplication; this has been found to
be a useful property in phylogenetics. Inspired by previous research involving
Lie closures of DNA models, it was hypothesised that finding the Lie closure of
a codon model could help to solve the problem of mis-estimation of the
non-synonymous/synonymous rate ratio, . We propose two different
methods of finding a linear space from a model: the first is the \emph{linear
closure} which is the smallest linear space which contains the model, and the
second is the \emph{linear version} which changes multiplicative constraints in
the model to additive ones. For each of these linear spaces we then find the
Lie closures of them. Under both methods, it was found that closed codon models
would require thousands of parameters, and that any partial solution to this
problem that was of a reasonable size violated stochasticity. Investigation of
toy models indicated that finding the Lie closure of matrix linear spaces which
deviated only slightly from a simple model resulted in a Lie closure that was
close to having the maximum number of parameters possible. Given that Lie
closures are not practical, we propose further consideration of the two
variants of linearly closed models
Using the tangle: a consistent construction of phylogenetic distance matrices for quartets
Distance based algorithms are a common technique in the construction of
phylogenetic trees from taxonomic sequence data. The first step in the
implementation of these algorithms is the calculation of a pairwise distance
matrix to give a measure of the evolutionary change between any pair of the
extant taxa. A standard technique is to use the log det formula to construct
pairwise distances from aligned sequence data. We review a distance measure
valid for the most general models, and show how the log det formula can be used
as an estimator thereof. We then show that the foundation upon which the log
det formula is constructed can be generalized to produce a previously unknown
estimator which improves the consistency of the distance matrices constructed
from the log det formula. This distance estimator provides a consistent
technique for constructing quartets from phylogenetic sequence data under the
assumption of the most general Markov model of sequence evolution.Comment: 18 Pges. Submitted to Mathematical Bioscience
A tensorial approach to the inversion of group-based phylogenetic models
Using a tensorial approach, we show how to construct a one-one correspondence
between pattern probabilities and edge parameters for any group-based model.
This is a generalisation of the "Hadamard conjugation" and is equivalent to
standard results that use Fourier analysis. In our derivation we focus on the
connections to group representation theory and emphasize that the inversion is
possible because, under their usual definition, group-based models are defined
for abelian groups only. We also argue that our approach is elementary in the
sense that it can be understood as simple matrix multiplication where matrices
are rectangular and indexed by ordered-partitions of varying sizes.Comment: 24 pages, 2 figure
Distinguishing between convergent evolution and violation of the molecular clock
We give a non-technical introduction to convergence-divergence models, a new
modeling approach for phylogenetic data that allows for the usual divergence of
species post speciation but also allows for species to converge, i.e. become
more similar over time. By examining the -taxon case in some detail we
illustrate that phylogeneticists have been "spoiled" in the sense of not having
to think about the structural parameters in their models by virtue of the
strong assumption that evolution is treelike. We show that there are not always
good statistical reasons to prefer the usual class of treelike models over more
general convergence-divergence models. Specifically we show many -taxon
datasets can be equally well explained by supposing violation of the molecular
clock due to change in the rate of evolution along different edges, or by
keeping the assumption of a constant rate of evolution but instead assuming
that evolution is not a purely divergent process. Given the abundance of
evidence that evolution is not strictly treelike, our discussion is an
illustration that as phylogeneticists we often need to think clearly about the
structural form of the models we use.Comment: 12 pages, 3 figure
Polar decomposition of a Dirac spinor
Local decompositions of a Dirac spinor into `charged' and `real' pieces
psi(x) = M(x) chi(x) are considered. chi(x) is a Majorana spinor, and M(x) a
suitable Dirac-algebra valued field. Specific examples of the decomposition in
2+1 dimensions are developed, along with kinematical implications, and
constraints on the component fields within M(x) sufficient to encompass the
correct degree of freedom count. Overall local reparametrisation and
electromagnetic phase invariances are identified, and a dynamical framework of
nonabelian gauge theories of noncompact groups is proposed. Connections with
supersymmetric composite models are noted (including, for 2+1 dimensions,
infrared effective theories of spin-charge separation in models of high-Tc
superconductivity).Comment: 12 pages, LaTe
- …
