8,059 research outputs found
Circumstances in which parsimony but not compatibility will be provably misleading
Phylogenetic methods typically rely on an appropriate model of how data
evolved in order to infer an accurate phylogenetic tree. For molecular data,
standard statistical methods have provided an effective strategy for extracting
phylogenetic information from aligned sequence data when each site (character)
is subject to a common process. However, for other types of data (e.g.
morphological data), characters can be too ambiguous, homoplastic or saturated
to develop models that are effective at capturing the underlying process of
change. To address this, we examine the properties of a classic but neglected
method for inferring splits in an underlying tree, namely, maximum
compatibility. By adopting a simple and extreme model in which each character
either fits perfectly on some tree, or is entirely random (but it is not known
which class any character belongs to) we are able to derive exact and explicit
formulae regarding the performance of maximum compatibility. We show that this
method is able to identify a set of non-trivial homoplasy-free characters, when
the number of taxa is large, even when the number of random characters is
large. By contrast, we show that a method that makes more uniform use of all
the data --- maximum parsimony --- can provably estimate trees in which {\em
none} of the original homoplasy-free characters support splits.Comment: 37 pages, 2 figure
Reliability analysis of reconstructing phylogenies under long branch attraction conditions
Master's Project (M.S.) University of Alaska Fairbanks, 2018.In this simulation study we examined the reliability of three phylogenetic reconstruction techniques in a long branch attraction (LBA) situation: Maximum Parsimony (M P), Neighbor Joining (NJ), and Maximum Likelihood. Data were simulated under five DNA substitution models-JC, K2P, F81, HKY, and G T R-from four different taxa. Two branch length parameters of four taxon trees ranging from 0.05 to 0.75 with an increment of 0.02 were used to simulate DNA data under each model. For each model we simulated DNA sequences with 100, 250, 500 and 1000 sites with 100 replicates. When we have enough data the maximum likelihood technique is the most reliable of the three methods examined in this study for reconstructing phylogenies under LBA conditions. We also find that MP is the most sensitive to LBA conditions and that Neighbor Joining performs well under LBA conditions compared to MP
Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants
Sequences of two chloroplast photosystem genes, psaA and psbB, together comprising about 3,500 bp, were obtained for all five major groups of extant seed plants and several outgroups among other vascular plants. Strongly supported, but significantly conflicting, phylogenetic signals were obtained in parsimony analyses from partitions of the data into first and second codon positions versus third positions. In the former, both genes agreed on a monophyletic gymnosperms, with Gnetales closely related to certain conifers. In the latter, Gnetales are inferred to be the sister group of all other seed plants, with gymnosperms paraphyletic. None of the data supported the modern ‘‘anthophyte hypothesis,’’ which places Gnetales as the sister group of flowering plants. A series of simulation studies were undertaken to examine the error rate for parsimony inference. Three kinds of errors were examined: random error, systematic bias (both properties of finite data sets), and statistical inconsistency owing to long-branch attraction (an asymptotic property). Parsimony reconstructions were extremely biased for third-position data for psbB. Regardless of the true underlying tree, a tree in which Gnetales are sister to all other seed plants was likely to be reconstructed for these data. None of the combinations of genes or partitions permits the anthophyte tree to be reconstructed with high probability. Simulations of progressively larger data sets indicate the existence of long-branch attraction (statistical inconsistency) for third-position psbB data if either the anthophyte tree or the gymnosperm tree is correct. This is also true for the anthophyte tree using either psaA third positions or psbB first and second positions. A factor contributing to bias and inconsistency is extremely short branches at the base of the seed plant radiation, coupled with extremely high rates in Gnetales and nonseed plant outgroups. M. J. Sanderson,* M. F. Wojciechowski,*† J.-M. Hu,* T. Sher Khan,* and S. G. Brad
- …