23,574 research outputs found
Principal components analysis in the space of phylogenetic trees
Phylogenetic analysis of DNA or other data commonly gives rise to a
collection or sample of inferred evolutionary trees. Principal Components
Analysis (PCA) cannot be applied directly to collections of trees since the
space of evolutionary trees on a fixed set of taxa is not a vector space. This
paper describes a novel geometrical approach to PCA in tree-space that
constructs the first principal path in an analogous way to standard linear
Euclidean PCA. Given a data set of phylogenetic trees, a geodesic principal
path is sought that maximizes the variance of the data under a form of
projection onto the path. Due to the high dimensionality of tree-space and the
nonlinear nature of this problem, the computational complexity is potentially
very high, so approximate optimization algorithms are used to search for the
optimal path. Principal paths identified in this way reveal and quantify the
main sources of variation in the original collection of trees in terms of both
topology and branch lengths. The approach is illustrated by application to
simulated sets of trees and to a set of gene trees from metazoan (animal)
species.Comment: Published in at http://dx.doi.org/10.1214/11-AOS915 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A genomic approach to examine the complex evolution of laurasiatherian mammals
Recent phylogenomic studies have failed to conclusively resolve certain branches of the placental mammalian tree, despite the evolutionary analysis of genomic data from 32 species. Previous analyses of single genes and retroposon insertion data yielded support for different phylogenetic scenarios for the most basal divergences. The results indicated that some mammalian divergences were best interpreted not as a single bifurcating tree, but as an evolutionary network. In these studies the relationships among some orders of the super-clade Laurasiatheria were poorly supported, albeit not studied in detail. Therefore, 4775 protein-coding genes (6,196,263 nucleotides) were collected and aligned in order to analyze the evolution of this clade. Additionally, over 200,000 introns were screened in silico, resulting in 32 phylogenetically informative long interspersed nuclear elements (LINE) insertion events.
The present study shows that the genome evolution of Laurasiatheria may best be understood as an evolutionary network. Thus, contrary to the common expectation to resolve major evolutionary events as a bifurcating tree, genome analyses unveil complex speciation processes even in deep mammalian divergences. We exemplify this on a subset of 1159 suitable genes that have individual histories, most likely due to incomplete lineage sorting or introgression, processes that can make the genealogy of mammalian genomes complex.
These unexpected results have major implications for the understanding of evolution in general, because the evolution of even some higher level taxa such as mammalian orders may sometimes not be interpreted as a simple bifurcating pattern
- …