4,254 research outputs found
Invariant versus classical quartet inference when evolution is heterogeneous across sites and lineages
One reason why classical phylogenetic reconstruction methods fail to
correctly infer the underlying topology is because they assume oversimplified
models. In this paper we propose a topology reconstruction method consistent
with the most general Markov model of nucleotide substitution, which can also
deal with data coming from mixtures on the same topology. It is based on an
idea of Eriksson on using phylogenetic invariants and provides a system of
weights that can be used as input of quartet-based methods. We study its
performance on real data and on a wide range of simulated 4-taxon data (both
time-homogeneous and nonhomogeneous, with or without among-site rate
heterogeneity, and with different branch length settings). We compare it to the
classical methods of neighbor-joining (with paralinear distance), maximum
likelihood (with different underlying models), and maximum parsimony. Our
results show that this method is accurate and robust, has a similar performance
to ML when data satisfies the assumptions of both methods, and outperforms all
methods when these are based on inappropriate substitution models or when both
long and short branches are present. If alignments are long enough, then it
also outperforms other methods when some of its assumptions are violated.Comment: 32 pages; 9 figure
TrAp: a Tree Approach for Fingerprinting Subclonal Tumor Composition
Revealing the clonal composition of a single tumor is essential for
identifying cell subpopulations with metastatic potential in primary tumors or
with resistance to therapies in metastatic tumors. Sequencing technologies
provide an overview of an aggregate of numerous cells, rather than
subclonal-specific quantification of aberrations such as single nucleotide
variants (SNVs). Computational approaches to de-mix a single collective signal
from the mixed cell population of a tumor sample into its individual components
are currently not available. Herein we propose a framework for deconvolving
data from a single genome-wide experiment to infer the composition, abundance
and evolutionary paths of the underlying cell subpopulations of a tumor. The
method is based on the plausible biological assumption that tumor progression
is an evolutionary process where each individual aberration event stems from a
unique subclone and is present in all its descendants subclones. We have
developed an efficient algorithm (TrAp) for solving this mixture problem. In
silico analyses show that TrAp correctly deconvolves mixed subpopulations when
the number of subpopulations and the measurement errors are moderate. We
demonstrate the applicability of the method using tumor karyotypes and somatic
hypermutation datasets. We applied TrAp to SNV frequency profile from Exome-Seq
experiment of a renal cell carcinoma tumor sample and compared the mutational
profile of the inferred subpopulations to the mutational profiles of twenty
single cells of the same tumor. Despite the large experimental noise, specific
co-occurring mutations found in clones inferred by TrAp are also present in
some of these single cells. Finally, we deconvolve Exome-Seq data from three
distinct metastases from different body compartments of one melanoma patient
and exhibit the evolutionary relationships of their subpopulations
Tracing evolutionary links between species
The idea that all life on earth traces back to a common beginning dates back
at least to Charles Darwin's {\em Origin of Species}. Ever since, biologists
have tried to piece together parts of this `tree of life' based on what we can
observe today: fossils, and the evolutionary signal that is present in the
genomes and phenotypes of different organisms. Mathematics has played a key
role in helping transform genetic data into phylogenetic (evolutionary) trees
and networks. Here, I will explain some of the central concepts and basic
results in phylogenetics, which benefit from several branches of mathematics,
including combinatorics, probability and algebra.Comment: 18 pages, 6 figures (Invited review paper (draft version) for AMM
Phylogenetic mixtures on a single tree can mimic a tree of another topology
Phylogenetic mixtures model the inhomogeneous molecular evolution commonly
observed in data. The performance of phylogenetic reconstruction methods where
the underlying data is generated by a mixture model has stimulated considerable
recent debate. Much of the controversy stems from simulations of mixture model
data on a given tree topology for which reconstruction algorithms output a tree
of a different topology; these findings were held up to show the shortcomings
of particular tree reconstruction methods. In so doing, the underlying
assumption was that mixture model data on one topology can be distinguished
from data evolved on an unmixed tree of another topology given enough data and
the ``correct'' method. Here we show that this assumption can be false. For
biologists our results imply that, for example, the combined data from two
genes whose phylogenetic trees differ only in terms of branch lengths can
perfectly fit a tree of a different topology
Comparison of articulate brachiopod nuclear and mitochondrial gene trees leads to a clade-based redefinition of protostomes (Protostomozoa) and deuterostomes (Deuterostomozoa)
Nuclear and mtDNA sequences from selected short-looped terebratuloid (terebratulacean) articulate brachiopods yield congruent and genetically independent phylogenetic reconstructions by parsimony, neighbor-joining and maximum likelihood methods, suggesting that both sources of data are reliable guides to brachiopod species phylogeny. The present-day genealogical relationships and geographical distributions of the tested terebratuloid brachiopods are consistent with a tethyan dispersal and subsequent radiation. Concordance of nuclear and mitochondrial gene phylogenies reinforces previous indications that articulate brachiopods, inarticulate brachiopods, phoronids and ectoprocts cluster with other organisms generally regarded as protostomes. Since ontogeny and morphology in brachiopods, ectoprocts and phoronids depart in important respects from those features supposedly diagnostic of protostomes, this demonstrates that the operational definition of protostomy by the usual ontological characters must be misleading or unreliable. New, molecular, operational definitions are proposed to replace the traditional criteria for the recognition of protostomes and deuterostomes, and the clade-based terms 'Protostomozoa' and 'Deuterostomozoa' are proposed to replace the existing terms 'Protostomia' and 'Deuterostomia'
Does behavior reflect phylogeny in swiftlets (Aves: Apodidae)? A test using cytochrome b mitochondrial DNA sequences
Swiftlets are small insectivorous birds, many of which nest in caves and are known to echolocate. Due to a lack of distinguishing morphological characters, the taxonomy of swiftlets is primarily based on the presence or absence of echolocating ability, together with nest characters. To test the reliability of these behavioral characters, we constructed an independent phylogeny using cytochrome b mitochondrial DNA sequences from swiftlets and their relatives. This phylogeny is broadly consistent with the higher classification of swifts but does not support the monophyly of swiftlets. Echolocating swiftlets (Aerodramus) and the nonecholocating "giant swiftlet" (Hydrochous gigas) group together, but the remaining nonecholocating swiftlets belonging to Collocalia are not sister taxa to these swiftlets. While echolocation may be a synapomorphy of Aerodramus (perhaps secondarily lost in Hydrochous), no character of Aerodramus nests showed a statistically significant fit to the molecular phylogeny, indicating that nest characters are not phylogenetically reliable in this group
Multivariate Approaches to Classification in Extragalactic Astronomy
Clustering objects into synthetic groups is a natural activity of any
science. Astrophysics is not an exception and is now facing a deluge of data.
For galaxies, the one-century old Hubble classification and the Hubble tuning
fork are still largely in use, together with numerous mono-or bivariate
classifications most often made by eye. However, a classification must be
driven by the data, and sophisticated multivariate statistical tools are used
more and more often. In this paper we review these different approaches in
order to situate them in the general context of unsupervised and supervised
learning. We insist on the astrophysical outcomes of these studies to show that
multivariate analyses provide an obvious path toward a renewal of our
classification of galaxies and are invaluable tools to investigate the physics
and evolution of galaxies.Comment: Open Access paper.
http://www.frontiersin.org/milky\_way\_and\_galaxies/10.3389/fspas.2015.00003/abstract\>.
\<10.3389/fspas.2015.00003 \&g
Fast and scalable inference of multi-sample cancer lineages.
Somatic variants can be used as lineage markers for the phylogenetic reconstruction of cancer evolution. Since somatic phylogenetics is complicated by sample heterogeneity, novel specialized tree-building methods are required for cancer phylogeny reconstruction. We present LICHeE (Lineage Inference for Cancer Heterogeneity and Evolution), a novel method that automates the phylogenetic inference of cancer progression from multiple somatic samples. LICHeE uses variant allele frequencies of somatic single nucleotide variants obtained by deep sequencing to reconstruct multi-sample cell lineage trees and infer the subclonal composition of the samples. LICHeE is open source and available at http://viq854.github.io/lichee
- …