4,467 research outputs found
Circumstances in which parsimony but not compatibility will be provably misleading
Phylogenetic methods typically rely on an appropriate model of how data
evolved in order to infer an accurate phylogenetic tree. For molecular data,
standard statistical methods have provided an effective strategy for extracting
phylogenetic information from aligned sequence data when each site (character)
is subject to a common process. However, for other types of data (e.g.
morphological data), characters can be too ambiguous, homoplastic or saturated
to develop models that are effective at capturing the underlying process of
change. To address this, we examine the properties of a classic but neglected
method for inferring splits in an underlying tree, namely, maximum
compatibility. By adopting a simple and extreme model in which each character
either fits perfectly on some tree, or is entirely random (but it is not known
which class any character belongs to) we are able to derive exact and explicit
formulae regarding the performance of maximum compatibility. We show that this
method is able to identify a set of non-trivial homoplasy-free characters, when
the number of taxa is large, even when the number of random characters is
large. By contrast, we show that a method that makes more uniform use of all
the data --- maximum parsimony --- can provably estimate trees in which {\em
none} of the original homoplasy-free characters support splits.Comment: 37 pages, 2 figure
Improved Lower Bounds on the Compatibility of Multi-State Characters
We study a long standing conjecture on the necessary and sufficient
conditions for the compatibility of multi-state characters: There exists a
function such that, for any set of -state characters, is
compatible if and only if every subset of characters of is
compatible. We show that for every , there exists an incompatible set
of -state
characters such that every proper subset of is compatible. Thus, for every .
This improves the previous lower bound of given by Meacham (1983),
and generalizes the construction showing that given by Habib and
To (2011). We prove our result via a result on quartet compatibility that may
be of independent interest: For every integer , there exists an
incompatible set of
quartets over
labels such that every proper subset of is compatible. We contrast this
with a result on the compatibility of triplets: For every , if is
an incompatible set of more than triplets over labels, then some
proper subset of is incompatible. We show this upper bound is tight by
exhibiting, for every , a set of triplets over taxa such
that is incompatible, but every proper subset of is compatible
Recommended from our members
Inference of single-cell phylogenies from lineage tracing data using Cassiopeia.
The pairing of CRISPR/Cas9-based gene editing with massively parallel single-cell readouts now enables large-scale lineage tracing. However, the rapid growth in complexity of data from these assays has outpaced our ability to accurately infer phylogenetic relationships. First, we introduce Cassiopeia-a suite of scalable maximum parsimony approaches for tree reconstruction. Second, we provide a simulation framework for evaluating algorithms and exploring lineage tracer design principles. Finally, we generate the most complex experimental lineage tracing dataset to date, 34,557 human cells continuously traced over 15 generations, and use it for benchmarking phylogenetic inference approaches. We show that Cassiopeia outperforms traditional methods by several metrics and under a wide variety of parameter regimes, and provide insight into the principles for the design of improved Cas9-enabled recorders. Together, these should broadly enable large-scale mammalian lineage tracing efforts. Cassiopeia and its benchmarking resources are publicly available at www.github.com/YosefLab/Cassiopeia
Integration of Morphological Data into Molecular Phylogenetic Analysis: Toward the Identikit of the Stylasterid Ancestor
Stylasteridae is a hydroid family including 29 worldwide-distributed genera, all provided with a calcareous skeleton. They are abundant in shallow and deep waters and represent an important component of marine communities. In the present paper, we studied the evolution of ten morphological characters, currently used in stylasterid taxonomy, using a phylogenetic approach. Our results indicate that stylasterid morphology is highly plastic and that many events of independent evolution and reversion have occurred. Our analysis also allows sketching a possible identikit of the stylasterid ancestor. It had calcareous skeleton, reticulate-granular coenosteal texture, polyps randomly arranged, gastrostyle, and dactylopore spines, while lacking a gastropore lip and dactylostyles. If the ancestor had single or double/multiple chambered gastropore tube is uncertain. These data suggest that the ancestor was similar to the extant genera Cyclohelia and Stellapora. Our investigation is the first attempt to integrate molecular and morphological information to clarify the stylasterid evolutionary scenario and represents the first step to infer the stylasterid ancestor morphology. \ua9 2016 Puce et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Finding Optimal Tree Decompositions
The task of organizing a given graph into a structure called a tree decomposition is relevant in multiple areas of computer science. In particular, many NP-hard problems can be solved in polynomial time if a suitable tree decomposition of a graph describing the problem instance is given as a part of the input. This motivates the task of finding as good tree decompositions as possible, or ideally, optimal tree decompositions.
This thesis is about finding optimal tree decompositions of graphs with respect to several notions of optimality. Each of the considered notions measures the quality of a tree decomposition in the context of an application. In particular, we consider a total of seven problems that are formulated as finding optimal tree decompositions: treewidth, minimum fill-in, generalized and fractional hypertreewidth, total table size, phylogenetic character compatibility, and treelength. For each of these problems we consider the BT algorithm of Bouchitté and Todinca as the method of finding optimal tree decompositions.
The BT algorithm is well-known on the theoretical side, but to our knowledge the first time it was implemented was only recently for the 2nd Parameterized Algorithms and Computational Experiments Challenge (PACE 2017). The author’s implementation of the BT algorithm took the second place in the minimum fill-in track of PACE 2017. In this thesis we review and extend the BT algorithm and our implementation. In particular, we improve the eciency of the algorithm in terms of both theory and practice. We also implement the algorithm for each of the seven problems considered, introducing a novel adaptation of the algorithm for the maximum compatibility problem of phylogenetic characters. Our implementation outperforms alternative state-of-the-art approaches in terms of numbers of test instances solved on well-known benchmarks on minimum fill-in, generalized hypertreewidth, fractional hypertreewidth, total table size, and the maximum compatibility problem of phylogenetic characters. Furthermore, to our understanding the implementation is the first exact approach for the treelength problem
Probabilistic Graphical Model Representation in Phylogenetics
Recent years have seen a rapid expansion of the model space explored in
statistical phylogenetics, emphasizing the need for new approaches to
statistical model representation and software development. Clear communication
and representation of the chosen model is crucial for: (1) reproducibility of
an analysis, (2) model development and (3) software design. Moreover, a
unified, clear and understandable framework for model representation lowers the
barrier for beginners and non-specialists to grasp complex phylogenetic models,
including their assumptions and parameter/variable dependencies.
Graphical modeling is a unifying framework that has gained in popularity in
the statistical literature in recent years. The core idea is to break complex
models into conditionally independent distributions. The strength lies in the
comprehensibility, flexibility, and adaptability of this formalism, and the
large body of computational work based on it. Graphical models are well-suited
to teach statistical models, to facilitate communication among phylogeneticists
and in the development of generic software for simulation and statistical
inference.
Here, we provide an introduction to graphical models for phylogeneticists and
extend the standard graphical model representation to the realm of
phylogenetics. We introduce a new graphical model component, tree plates, to
capture the changing structure of the subgraph corresponding to a phylogenetic
tree. We describe a range of phylogenetic models using the graphical model
framework and introduce modules to simplify the representation of standard
components in large and complex models. Phylogenetic model graphs can be
readily used in simulation, maximum likelihood inference, and Bayesian
inference using, for example, Metropolis-Hastings or Gibbs sampling of the
posterior distribution
- …