Search CORE

23 research outputs found

Estimating phylogenetic trees from genome-scale data

Author: Davis Charles
Edwards Scott V.
Liu Liang
Wu Shaoyuan
Xi Zhenxiang
Publication venue: 'Wiley'
Publication date: 15/01/2015
Field of study

As researchers collect increasingly large molecular data sets to reconstruct the Tree of Life, the heterogeneity of signals in the genomes of diverse organisms poses challenges for traditional phylogenetic analysis. A class of phylogenetic methods known as "species tree methods" have been proposed to directly address one important source of gene tree heterogeneity, namely the incomplete lineage sorting or deep coalescence that occurs when evolving lineages radiate rapidly, resulting in a diversity of gene trees from a single underlying species tree. Although such methods are gaining in popularity, they are being adopted with caution in some quarters, in part because of an increasing number of examples of strong phylogenetic conflict between concatenation or supermatrix methods and species tree methods. Here we review theory and empirical examples that help clarify these conflicts. Thinking of concatenation as a special case of the more general model provided by the multispecies coalescent can help explain a number of differences in the behavior of the two methods on phylogenomic data sets. Recent work suggests that species tree methods are more robust than concatenation approaches to some of the classic challenges of phylogenetic analysis, including rapidly evolving sites in DNA sequences, base compositional heterogeneity and long branch attraction. We show that approaches such as binning, designed to augment the signal in species tree analyses, can distort the distribution of gene trees and are inconsistent. Computationally efficient species tree methods that incorporate biological realism are a key to phylogenetic analysis of whole genome data.Comment: 39 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Estimating phylogenetic trees from genome-scale data

Author: Davis Charles Cavender
Edwards Scott V.
Liu Liang
Wu Shaoyuan
Xi Zhenxiang
Publication venue: 'Wiley'
Publication date: 17/02/2017
Field of study

The heterogeneity of signals in the genomes of diverse organisms poses challenges for traditional phylogenetic analysis. Phylogenetic methods known as “species tree” methods have been proposed to directly address one important source of gene tree heterogeneity, namely the incomplete lineage sorting that occurs when evolving lineages radiate rapidly, resulting in a diversity of gene trees from a single underlying species tree. Here we review theory and empirical examples that help clarify conflicts between species tree and concatenation methods, and misconceptions in the literature about the performance of species tree methods. Considering concatenation as a special case of the multispecies coalescent model helps explain differences in the behavior of the two methods on phylogenomic data sets. Recent work suggests that species tree methods are more robust than concatenation approaches to some of the classic challenges of phylogenetic analysis, including rapidly evolving sites in DNA sequences and long-branch attraction. We show that approaches, such as binning, designed to augment the signal in species tree analyses can distort the distribution of gene trees and are inconsistent. Computationally efficient species tree methods incorporating biological realism are a key to phylogenetic analysis of whole-genome data.Organismic and Evolutionary Biolog

Harvard University - DASH

ASTRID: Accurate Species TRees from Internode Distances

Author: A Criscuolo
BR Larget
D Bryant
DF Robinson
ED Jarvis
G Dasarathy
I Gronau
J Chifman
J Heled
J Sukumaran
JFC Kingman
JH Degnan
JP Gatesy
L Kubatko
L Liu
L Liu
L Liu
L Liu
L Liu
L Nakhleh
LL Knowles
MN Price
MS Bayzid
MS Bayzid
MS Bayzid
N Saitou
Pranjal Vachaspati
R Desper
S Mirarab
S Mirarab
S Mirarab
S Mirarab
S Mirarab
S Roch
S Roch
S Roch
S Song
S Song
T Warnow
Tandy Warnow
W Maddison
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A comparative study of SVDquartets and other coalescent-based species tree estimation methods

Author: Ashu Gupta
Jed Chou
Mike Nute
Ruth Davidson
Shashank Yaduvanshi
Siavash Mirarab
Tandy Warnow
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Crossref

Springer - Publisher Connector

Weighted Statistical Binning: enabling statistically consistent genome-scale phylogenetic analyses

Author: Bayzid Md. Shamsuzzoha
Boussau Bastien
Mirarab Siavash
Warnow Tandy
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 03/06/2015
Field of study

Because biological processes can make different loci have different evolutionary histories, species tree estimation requires multiple loci from across the genome. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called summary methods. Because summary methods are generally fast, they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate on biologically realistic conditions. Mirarab et al. (Science 2014) presented the statistical binning technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple statistical test for combinability and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomics pipeline does not have the desirable property of being statistically consistent. We show that weighting the recalculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistical consistent under the multi-species coalescent model.Comment: (1) In Press, PLoS ON

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

INRIA a CCSD electronic archive server

PubMed Central

HAL Descartes

The evolution of complex calls In meadow Katydids

Author: Harness Nathan Charles Eugene
Publication venue: 'University of Missouri Libraries'
Publication date
Field of study

Meadow Katydids (genera Orchelimum and Conocephalus) are a specious group often are found in habitats with several species within the group living in sympatry. They produce complex calls with two distinct phrases, "buzzing" and "ticking". These two phrases are organized in a highly diverse way across species. This diversity of call patterns in Meadow Katydids provides an excellent opportunity to comparatively study the evolution of complex calls. We tested the function of the two call phrases in male-male interactions. we examined the structure of the male call in the context of communities to identify candidate traits (i.e. traits likely involved in reproductive isolation). We constructed a molecular phylogeny from twenty species of Meadow Katydids, and examined the phylogenetic signal within call traits. The results of all of this taken together suggests ticking evolved in the context of male-male interaction, buzzing has been important for diversification, and in some species females have co-opted the tick to also function in reproductive isolation. Importantly, we have also designed and field-tested a plan to use Meadow Katydids as tools in primary, secondary, and post-secondary classrooms/laboratoriesIncludes bibliographical reference

University of Missouri: MOspace

How challenging RADseq data turned out to favor coalescent-based species tree inference. A case study in Aichryson (Crassulaceae)

Author: Dillenberger Markus S.
Gerschwitz-Eidt Michael
Hörandl Elvira
Hühn Philipp
Kadereit Gudrun
Los Jessica A.
Messerschmid Thibaud F. E.
Paetzold Claudia
Rieger Benjamin
Publication venue
Publication date: 01/01/2022
Field of study

Analysing multiple genomic regions while incorporating detection and qualification of discordance among regions has become standard for understanding phylogenetic relationships. In plants, which usually have comparatively large genomes, this is feasible by the combination of reduced-representation library (RRL) methods and high-throughput sequencing enabling the cost effective acquisition of genomic data for thousands of loci from hundreds of samples. One popular RRL method is RADseq. A major disadvantage of established RADseq approaches is the rather short fragment and sequencing range, leading to loci of little individual phylogenetic information. This issue hampers the application of coalescent-based species tree inference. The modified RADseq protocol presented here targets ca. 5,000 loci of 300-600nt length, sequenced with the latest short-read-sequencing (SRS) technology, has the potential to overcome this drawback. To illustrate the advantages of this approach we use the study group Aichryson Webb & Berthelott (Crassulaceae), a plant genus that diversified on the Canary Islands. The data analysis approach used here aims at a careful quality control of the long loci dataset. It involves an informed selection of thresholds for accurate clustering, a thorough exploration of locus properties, such as locus length, coverage and variability, to identify potential biased data and a comparative phylogenetic inference of filtered datasets, accompanied by an evaluation of resulting BS support, gene and site concordance factor values, to improve overall resolution of the resulting phylogenetic trees. The final dataset contains variable loci with an average length of 373nt and facilitates species tree estimation using a coalescent-based summary approach. Additional improvements brought by the approach are critically discussed

Institutional Repository of the Freie Universität Berlin