Search CORE

5,824 research outputs found

Dimensional reduction for the general Markov model on phylogenetic trees

Author: Sumner Jeremy G
Publication venue
Publication date: 27/11/2016
Field of study

We present a method of dimensional reduction for the general Markov model of sequence evolution on a phylogenetic tree. We show that taking certain linear combinations of the associated random variables (site pattern counts) reduces the dimensionality of the model from exponential in the number of extant taxa, to quadratic in the number of taxa, while retaining the ability to statistically identify phylogenetic divergence events. A key feature is the identification of an invariant subspace which depends only bilinearly on the model parameters, in contrast to the usual multi-linear dependence in the full space. We discuss potential applications including the computation of split (edge) weights on phylogenetic trees from observed sequence data.Comment: 17 pages, 3 figures. v4: Substantial revision. Additional motivations for transformation rules and more details in proofs of the main results are provide

arXiv.org e-Print Archive

Lie geometry of 2x2 Markov matrices

Author: Sumner Jeremy G.
Publication venue
Publication date: 20/12/2012
Field of study

In recent work discussing model choice for continuous-time Markov chains, we have argued that it is important that the Markov matrices that define the model are closed under matrix multiplication (Sumner 2012a, 2012b). The primary requirement is then that the associated set of rate matrices form a Lie algebra. For the generic case, this connection to Lie theory seems to have first been made by Johnson (1985), with applications for specific models given in Bashford (2004) and House (2012). Here we take a different perspective: given a model that forms a Lie algebra, we apply existing Lie theory to gain additional insight into the geometry of the associated Markov matrices. In this short note, we present the simplest case possible of 2x2 Markov matrices. The main result is a novel decomposition of 2x2 Markov matrices that parameterises the general Markov model as a perturbation away from the binary-symmetric model. This alternative parameterisation provides a useful tool for visualising the binary-symmetric model as a submodel of the general Markov model.Comment: 5 pages, 2 figure

arXiv.org e-Print Archive

Markov invariants for phylogenetic rate matrices derived from embedded submodels

Author: Jarvis P. D.
Sumner J. G.
Publication venue
Publication date: 06/08/2010
Field of study

We consider novel phylogenetic models with rate matrices that arise via the embedding of a progenitor model on a small number of character states, into a target model on a larger number of character states. Adapting representation-theoretic results from recent investigations of Markov invariants for the general rate matrix model, we give a prescription for identifying and counting Markov invariants for such `symmetric embedded' models, and we provide enumerations of these for low-dimensional cases. The simplest example is a target model on 3 states, constructed from a general 2 state model; the `2->3' embedding. We show that for 2 taxa, there exist two invariants of quadratic degree, that can be used to directly infer pairwise distances from observed sequences under this model. A simple simulation study verifies their theoretical expected values, and suggests that, given the appropriateness of the model class, they have greater statistical power than the standard (log) Det invariant (which is of cubic degree for this case).Comment: 16 pages, 1 figure, 1 appendi

arXiv.org e-Print Archive

Lie-Markov models derived from finite semigroups

Author: Sumner Jeremy G.
Woodhams Michael D.
Publication venue
Publication date: 01/09/2017
Field of study

We present and explore a general method for deriving a Lie-Markov model from a finite semigroup. If the degree of the semigroup is

k

, the resulting model is a continuous-time Markov chain on

k

states and, as a consequence of the product rule in the semigroup, satisfies the property of multiplicative closure. This means that the product of any two probability substitution matrices taken from the model produces another substitution matrix also in the model. We show that our construction is a natural generalization of the concept of group-based models.Comment: 17 page

arXiv.org e-Print Archive

Entanglement Invariants and Phylogenetic Branching

Author: Jarvis P. D.
Sumner J. G.
Publication venue
Publication date: 30/11/2004
Field of study

It is possible to consider stochastic models of sequence evolution in phylogenetics in the context of a dynamical tensor description inspired from physics. Approaching the problem in this framework allows for the well developed methods of mathematical physics to be exploited in the biological arena. We present the tensor description of the homogeneous continuous time Markov chain model of phylogenetics with branching events generated by dynamical operations. Standard results from phylogenetics are shown to be derivable from the tensor framework. We summarize a powerful approach to entanglement measures in quantum physics and present its relevance to phylogenetic analysis. Entanglement measures are found to give distance measures that are equivalent to, and expand upon, those already known in phylogenetics. In particular we make the connection between the group invariant functions of phylogenetic data and phylogenetic distance functions. We introduce a new distance measure valid for three taxa based on the group invariant function known in physics as the "tangle". All work is presented for the homogeneous continuous time Markov chain model with arbitrary rate matrices.Comment: 21 pages, 3 Figures. Accepted for publication in Journal of Mathematical Biolog

arXiv.org e-Print Archive

The impracticalities of multiplicatively-closed codon models: a retreat to linear alternatives

Author: Holland Barbara R.
Shore Julia A.
Sumner Jeremy G.
Publication venue
Publication date: 01/01/2020
Field of study

A matrix Lie algebra is a linear space of matrices closed under the operation

[A, B] = AB-BA

. The "Lie closure" of a set of matrices is the smallest matrix Lie algebra which contains the set. In the context of Markov chain theory, if a set of rate matrices form a Lie algebra, their corresponding Markov matrices are closed under matrix multiplication; this has been found to be a useful property in phylogenetics. Inspired by previous research involving Lie closures of DNA models, it was hypothesised that finding the Lie closure of a codon model could help to solve the problem of mis-estimation of the non-synonymous/synonymous rate ratio,

\omega

. We propose two different methods of finding a linear space from a model: the first is the \emph{linear closure} which is the smallest linear space which contains the model, and the second is the \emph{linear version} which changes multiplicative constraints in the model to additive ones. For each of these linear spaces we then find the Lie closures of them. Under both methods, it was found that closed codon models would require thousands of parameters, and that any partial solution to this problem that was of a reasonable size violated stochasticity. Investigation of toy models indicated that finding the Lie closure of matrix linear spaces which deviated only slightly from a simple model resulted in a Lie closure that was close to having the maximum number of parameters possible. Given that Lie closures are not practical, we propose further consideration of the two variants of linearly closed models

arXiv.org e-Print Archive

University of Tasmania Open Access Repository

Using the tangle: a consistent construction of phylogenetic distance matrices for quartets

Author: Jarvis P D
Sumner J G
Publication venue
Publication date: 01/01/2005
Field of study

Distance based algorithms are a common technique in the construction of phylogenetic trees from taxonomic sequence data. The first step in the implementation of these algorithms is the calculation of a pairwise distance matrix to give a measure of the evolutionary change between any pair of the extant taxa. A standard technique is to use the log det formula to construct pairwise distances from aligned sequence data. We review a distance measure valid for the most general models, and show how the log det formula can be used as an estimator thereof. We then show that the foundation upon which the log det formula is constructed can be generalized to produce a previously unknown estimator which improves the consistency of the distance matrices constructed from the log det formula. This distance estimator provides a consistent technique for constructing quartets from phylogenetic sequence data under the assumption of the most general Markov model of sequence evolution.Comment: 18 Pges. Submitted to Mathematical Bioscience

arXiv.org e-Print Archive

CiteSeerX

A tensorial approach to the inversion of group-based phylogenetic models

Author: Holland Barbara R.
Jarvis Peter D.
Sumner Jeremy G.
Publication venue
Publication date: 17/12/2012
Field of study

Using a tensorial approach, we show how to construct a one-one correspondence between pattern probabilities and edge parameters for any group-based model. This is a generalisation of the "Hadamard conjugation" and is equivalent to standard results that use Fourier analysis. In our derivation we focus on the connections to group representation theory and emphasize that the inversion is possible because, under their usual definition, group-based models are defined for abelian groups only. We also argue that our approach is elementary in the sense that it can be understood as simple matrix multiplication where matrices are rectangular and indexed by ordered-partitions of varying sizes.Comment: 24 pages, 2 figure

arXiv.org e-Print Archive

Distinguishing between convergent evolution and violation of the molecular clock

Author: Holland Barbara R.
Mitchell Jonathan D.
Sumner Jeremy G.
Publication venue
Publication date: 13/09/2017
Field of study

We give a non-technical introduction to convergence-divergence models, a new modeling approach for phylogenetic data that allows for the usual divergence of species post speciation but also allows for species to converge, i.e. become more similar over time. By examining the

3

-taxon case in some detail we illustrate that phylogeneticists have been "spoiled" in the sense of not having to think about the structural parameters in their models by virtue of the strong assumption that evolution is treelike. We show that there are not always good statistical reasons to prefer the usual class of treelike models over more general convergence-divergence models. Specifically we show many

3

-taxon datasets can be equally well explained by supposing violation of the molecular clock due to change in the rate of evolution along different edges, or by keeping the assumption of a constant rate of evolution but instead assuming that evolution is not a purely divergent process. Given the abundance of evidence that evolution is not strictly treelike, our discussion is an illustration that as phylogeneticists we often need to think clearly about the structural form of the models we use.Comment: 12 pages, 3 figure

arXiv.org e-Print Archive

Polar decomposition of a Dirac spinor

Author: Jarvis P. D.
Sumner J. G.
Publication venue
Publication date: 01/01/2002
Field of study

Local decompositions of a Dirac spinor into `charged' and `real' pieces psi(x) = M(x) chi(x) are considered. chi(x) is a Majorana spinor, and M(x) a suitable Dirac-algebra valued field. Specific examples of the decomposition in 2+1 dimensions are developed, along with kinematical implications, and constraints on the component fields within M(x) sufficient to encompass the correct degree of freedom count. Overall local reparametrisation and electromagnetic phase invariances are identified, and a dynamical framework of nonabelian gauge theories of noncompact groups is proposed. Connections with supersymmetric composite models are noted (including, for 2+1 dimensions, infrared effective theories of spin-charge separation in models of high-Tc superconductivity).Comment: 12 pages, LaTe

arXiv.org e-Print Archive

CiteSeerX