Search CORE

26 research outputs found

Polyhedral geometry of Phylogenetic Rogue Taxa

Author: Cueto María Angélica
Matsen Frederick A.
Publication venue
Publication date: 01/01/2010
Field of study

It is well known among phylogeneticists that adding an extra taxon (e.g. species) to a data set can alter the structure of the optimal phylogenetic tree in surprising ways. However, little is known about this "rogue taxon" effect. In this paper we characterize the behavior of balanced minimum evolution (BME) phylogenetics on data sets of this type using tools from polyhedral geometry. First we show that for any distance matrix there exist distances to a "rogue taxon" such that the BME-optimal tree for the data set with the new taxon does not contain any nontrivial splits (bipartitions) of the optimal tree for the original data. Second, we prove a theorem which restricts the topology of BME-optimal trees for data sets of this type, thus showing that a rogue taxon cannot have an arbitrary effect on the optimal tree. Third, we construct polyhedral cones computationally which give complete answers for BME rogue taxon behavior when our original data fits a tree on four, five, and six taxa. We use these cones to derive sufficient conditions for rogue taxon behavior for four taxa, and to understand the frequency of the rogue taxon effect via simulation.Comment: In this version, we add quartet distances and fix Table 4

arXiv.org e-Print Archive

CiteSeerX

Springer - Publisher Connector

A sub-cubic time algorithm for computing the quartet distance between two general trees

Author: Anders K Kristensen
BL Allen
C Christiansen
C Christiansen
Christian NS Pedersen
D Bryant
D Coppersmith
DF Robinson
DF Robinson
G Estabrook
GS Brodal
Jesper Nielsen
M Steel
M Stissing
MS Waterman
Thomas Mailund
Publication venue: BioMed Central
Publication date
Field of study

Crossref

PubMed Central

Cultural Phylogenetics of the Tupi Language Family in Lowland South America

Author: A Hornborg
A Matraux
A Rodrigues
AAS Mello
AAS Mello
AD Rodrigues
AD Rodrigues
AD Rodrigues
AD Rodrigues
AD Rodrigues
AD Rodrigues
AL Kroeber
ASAC Cabral
C Christiansen
C Holden
C Jensen
CH Brown
CJ Holden
CJ Holden
CO Schleicher
CT Palmer
Curtis J. Atkisson
CW Dunn
D Bryant
D Bryant
D Critchlow
DF Robinson
DH Parks
E Viveiros de Castro
EW Holman
F Boas
F Noelli
F Salzano
FM Jordan
FM Jordan
GF Estabrook
GV Dziebel
I Temkin
J Bollback
J Cisco
J Diamond
J Gray
J Hemming
J Henrich
J Nielsen
J Wilbert
JA Hostetler
JE Terrell
JH Steward
K Tamura
KA Crandall
KD Macario
L Fortunato
L Fortunato
L Fortunato
L Fortunato
L Fortunato
M Borgerhoff Mulder
M Borgerhoff Mulder
M Collard
M Harris
M Lemle
M Pagel
M Pagel
M Pagel
M Pagel
M Pagel
Martin Krkosek
MS Bansal
PO Lewis
R Boyd
R Gray
R Gray
R Mace
R Mace
R Scheel-Ybert
RD Gray
RD Gray
RD Gray
Robert S. Walker
RS Walker
RS Walker
S Drude
S Gildea
S Klimek
S Nelson-Sathi
S Pompei
S Wichmann
S Wichmann
SJ Greenhill
SJ Greenhill
SJ Lycett
SM Callegari-Jacques
Søren Wichmann
T Mailund
TE Currie
TE Currie
Thomas Mailund
W Baleé
W Dietrich
W Dietrich
W Maddison
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Background: Recent advances in automated assessment of basic vocabulary lists allow the construction of linguistic phylogenies useful for tracing dynamics of human population expansions, reconstructing ancestral cultures, and modeling transition rates of cultural traits over time. Methods: Here we investigate the Tupi expansion, a widely-dispersed language family in lowland South America, with a distance-based phylogeny based on 40-word vocabulary lists from 48 languages. We coded 11 cultural traits across the diverse Tupi family including traditional warfare patterns, post-marital residence, corporate structure, community size, paternity beliefs, sibling terminology, presence of canoes, tattooing, shamanism, men’s houses, and lip plugs. Results/Discussion: The linguistic phylogeny supports a Tupi homeland in west-central Brazil with subsequent major expansions across much of lowland South America. Consistently, ancestral reconstructions of cultural traits over the linguistic phylogeny suggest that social complexity has tended to decline through time, most notably in the independent emergence of several nomadic hunter-gatherer societies. Estimated rates of cultural change across the Tupi expansion are on the order of only a few changes per 10,000 years, in accord with previous cultural phylogenetic results in other languag

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

MPG.PuRe

FigShare

Including RNA secondary structures improves accuracy and robustness in reconstruction of phylogenetic trees

Author: Dandekar Thomas
Förster Frank
Keller Alexander
Müller Tobias
Schultz Jörg
Wolf Matthias
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background In several studies, secondary structures of ribosomal genes have been used to improve the quality of phylogenetic reconstructions. An extensive evaluation of the benefits of secondary structure, however, is lacking. Results This is the first study to counter this deficiency. We inspected the accuracy and robustness of phylogenetics with individual secondary structures by simulation experiments for artificial tree topologies with up to 18 taxa and for divergency levels in the range of typical phylogenetic studies. We chose the internal transcribed spacer 2 of the ribosomal cistron as an exemplary marker region. Simulation integrated the coevolution process of sequences with secondary structures. Additionally, the phylogenetic power of marker size duplication was investigated and compared with sequence and sequence-structure reconstruction methods. The results clearly show that accuracy and robustness of Neighbor Joining trees are largely improved by structural information in contrast to sequence only data, whereas a doubled marker size only accounts for robustness. Conclusions Individual secondary structures of ribosomal RNA sequences provide a valuable gain of information content that is useful for phylogenetics. Thus, the usage of ITS2 sequence together with secondary structure for taxonomic inferences is recommended. Other reconstruction methods as maximum likelihood, bayesian inference or maximum parsimony may equally profit from secondary structure inclusion. Reviewers This article was reviewed by Shamil Sunyaev, Andrea Tanzer (nominated by Frank Eisenhaber) and Eugene V. Koonin. Open peer review Reviewed by Shamil Sunyaev, Andrea Tanzer (nominated by Frank Eisenhaber) and Eugene V. Koonin. For the full reviews, please go to the Reviewers' comments section.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Online-Publikations-Server der Universität Würzburg

Pattern-based phylogenetic distance estimation and tree reconstruction

Author: Höhl Michael
Ragan Mark A.
Rigoutsos Isidore
Publication venue
Publication date: 01/01/2006
Field of study

We have developed an alignment-free method that calculates phylogenetic distances using a maximum likelihood approach for a model of sequence change on patterns that are discovered in unaligned sequences. To evaluate the phylogenetic accuracy of our method, and to conduct a comprehensive comparison of existing alignment-free methods (freely available as Python package decaf+py at http://www.bioinformatics.org.au), we have created a dataset of reference trees covering a wide range of phylogenetic distances. Amino acid sequences were evolved along the trees and input to the tested methods; from their calculated distances we infered trees whose topologies we compared to the reference trees. We find our pattern-based method statistically superior to all other tested alignment-free methods on this dataset. We also demonstrate the general advantage of alignment-free methods over an approach based on automated alignments when sequences violate the assumption of collinearity. Similarly, we compare methods on empirical data from an existing alignment benchmark set that we used to derive reference distances and trees. Our pattern-based approach yields distances that show a linear relationship to reference distances over a substantially longer range than other alignment-free methods. The pattern-based approach outperforms alignment-free methods and its phylogenetic accuracy is statistically indistinguishable from alignment-based distances.Comment: 21 pages, 3 figures, 2 table

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

PubMed Central

University of Queensland eSpace

Evolutionary distances in the twilight zone -- a rational kernel approach

Author: A Keller
A Löytynoja
A Stamatakis
B Chor
B Schölkopf
Benjamin Merget
C Cortes
C Daskalakis
CB Do
E Rivas
F Bemm
Florian Markowetz
Frank Förster
G Talavera
HH Otu
I Ulitsky
J Felsenstein
J Friedrich
J Hein
JL Thorne
JL Thorne
Jörg Schultz
KM Wong
LS Wang
M Höhl
M Höhl
M Mohri
M Mohri
M Wolf
MA Buchheim
MA Suchard
Matthias Wolf
MJ Bishop
MK Kuhner
MS Waterman
N Goldman
N Higham
R Durbin
RC Edgar
RF Doolittle
Roland F. Schwarz
S Roch
S Whelan
SR Eddy
T Mailund
T Müller
TH Ogden
V Levenshtein
W Fletcher
W Fletcher
Wayne Delport
William Fletcher
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 23/11/2010
Field of study

Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.Comment: to appear in PLoS ON

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

MDC Repository

Inferring phylogenies with incomplete data sets: a 5-gene, 567-taxon analysis of angiosperms

Author: Burleigh J Gordon
Hilu Khidir W
Soltis Douglas E
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Phylogenetic analyses of angiosperm relationships have used only a small percentage of available sequence data, but phylogenetic data matrices often can be augmented with existing data, especially if one allows missing characters. We explore the effects on phylogenetic analyses of adding 378 <it>matK </it>sequences and 240 26S rDNA sequences to the complete 3-gene, 567-taxon angiosperm phylogenetic matrix of Soltis et al. Results We performed maximum likelihood bootstrap analyses of the complete, 3-gene 567-taxon data matrix and the incomplete, 5-gene 567-taxon data matrix. Although the 5-gene matrix has more missing data (27.5%) than the 3-gene data matrix (2.9%), the 5-gene analysis resulted in higher levels of bootstrap support. Within the 567-taxon tree, the increase in support is most evident for relationships among the 170 taxa for which both <it>matK </it>and 26S rDNA sequences were added, and there is little gain in support for relationships among the 119 taxa having neither <it>matK </it>nor 26S rDNA sequences. The 5-gene analysis also places the enigmatic <it>Hydrostachys </it>in Lamiales (BS = 97%) rather than in Cornales (BS = 100% in 3-gene analysis). The placement of <it>Hydrostachys </it>in Lamiales is unprecedented in molecular analyses, but it is consistent with embryological and morphological data. Conclusion Adding available, and often incomplete, sets of sequences to existing data sets can be a fast and inexpensive way to increase support for phylogenetic relationships and produce novel and credible new phylogenetic hypotheses.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Phylogenetic Divergence Time, Algorithms for Improved Accuracy and Performance

Author: Crosby Ralph
Publication venue
Publication date: 29/10/2015
Field of study

The inference of species divergence time is a key step in the study of phylogenetics. Methods have been available for the last ten years to perform the inference, but, there are two significant problems with these methods. First, the performance of the methods does not yet scale well to studies with hundreds of taxa and thousands of DNA base pairs. A study of 349 taxa was estimated to require over 9 months of processing time. Second, the accuracy of the inference process is subject to bias and variance in the specification of model parameters that is not completely understood. These parameters include both the topology of the phylogenetic tree and, more importantly for our purposes, the set of fossils used to calibrate the tree. In this work, we present new algorithms and methods to improve the performance of the divergence time process. We demonstrate a new algorithm for the computation of phylogenetic likelihood and experimentally illustrate a 90% improvement in likelihood computation time on the aforementioned dataset of 349 taxa with over 60,000 DNA base pairs. Additionally we show a new algorithm for the computation of the Bayesian prior on node ages that is experimentally shown to reduce the time for this computation on the 349 taxa dataset by 99%. Using our high performance methods, we present a novel new method for assessing the level of support for the ages inferred. This method utilizes a statistical jackknifing technique on the set of fossil calibrations producing a support value similar to the bootstrap used in phylogenetic inference. Finally, we present efficient methods for divergence time inference on sets of trees based on our development of subtree sharing models. We show a 60% improvement in processing times on a dataset of 567 taxa with over 10,000 DNA base pairs

Texas A&M Repository