Search CORE

Directory of Open Access Journals

BBCA: Improving the Scalability of *BEAST Using Random Binning

Author: Mirarab Siavash
Warnow Tandy
Zimmermann Theo
Publication venue
Publication date: 01/01/2014
Field of study

Species tree estimation can be challenging in the presence of gene tree conflict due to incomplete lineage sorting (ILS), which can occur when the time between speciation events is short relative to the population size. Of the many methods that have been developed to estimate species trees in the presence of ILS, *BEAST, a Bayesian method that co-estimates the species tree and gene trees given sequence alignments on multiple loci, has generally been shown to have the best accuracy. However, *BEAST is extremely computationally intensive so that it cannot be used with large numbers of loci; hence, *BEAST is not suitable for genome-scale analyses. Results: We present BBCA (boosted binned coalescent-based analysis), a method that can be used with *BEAST (and other such co-estimation methods) to improve scalability. BBCA partitions the loci randomly into subsets, uses *BEAST on each subset to co-estimate the gene trees and species tree for the subset, and then combines the newly estimated gene trees together using MP-EST, a popular coalescent-based summary method. We compare time-restricted versions of BBCA and *BEAST on simulated datasets, and show that BBCA is at least as accurate as *BEAST, and achieves better convergence rates for large numbers of loci. Conclusions: Phylogenomic analysis using *BEAST is currently limited to datasets with a small number of loci, and analyses with even just 100 loci can be computationally challenging. BBCA uses a very simple divide-and-conquer approach that makes it possible to use *BEAST on datasets containing hundreds of loci. This study shows that BBCA provides excellent accuracy and is highly scalable.Grant Agency of the Czech Republic P501-10-0208Academy of Sciences of the Czech Republic AVOZ50040507, AVOZ50040702, MSMT LC0604Ministry of Innovation and Science of Spain, MICINN CGL2007-64839-C02/BOSCSIC (Superior Council of Scientific InvestigationsJosé Castillejo Grant from the MICINN of the Spanish GovernmentComputer Science

Texas ScholarWorks

Disk Covering Methods Improve Phylogenomic Analyses

Author: Bayzid Md Shamsuzzoha
Hunt Tyler
Warnow Tandy
Publication venue
Publication date: 01/10/2014
Field of study

Motivation: With the rapid growth rate of newly sequenced genomes, species tree inference from multiple genes has become a basic bioinformatics task in comparative and evolutionary biology. However, accurate species tree estimation is difficult in the presence of gene tree discordance, which is often due to incomplete lineage sorting (ILS), modelled by the multi-species coalescent. Several highly accurate coalescent-based species tree estimation methods have been developed over the last decade, including MP-EST. However, the running time for MP-EST increases rapidly as the number of species grows. Results: We present divide-and-conquer techniques that improve the scalability of MP-EST so that it can run efficiently on large datasets. Surprisingly, this technique also improves the accuracy of species trees estimated by MP-EST, as our study shows on a collection of simulated and biological datasets.NSF DEB 0733029, DBI 1062335Computer Science

Texas ScholarWorks

Bone-Associated Gene Evolution and the Origin of Flight in Birds

Author: Antunes Agostinho
Gilbert M. Thomas P.
Jarvis Erich D.
Johnson Warren E.
Machado Joao P.
O\u27Brien Stephen J.
Zhang Guojie
Publication venue: NSUWorks
Publication date: 01/01/2016
Field of study

Background Bones have been subjected to considerable selective pressure throughout vertebrate evolution, such as occurred during the adaptations associated with the development of powered flight. Powered flight evolved independently in two extant clades of vertebrates, birds and bats. While this trait provided advantages such as in aerial foraging habits, escape from predators or long-distance travels, it also imposed great challenges, namely in the bone structure. Results We performed comparative genomic analyses of 89 bone-associated genes from 47 avian genomes (including 45 new), 39 mammalian, and 20 reptilian genomes, and demonstrate that birds, after correcting for multiple testing, have an almost two-fold increase in the number of bone-associated genes with evidence of positive selection (~52.8 %) compared with mammals (~30.3 %). Most of the positive-selected genes in birds are linked with bone regulation and remodeling and thirteen have been linked with functional pathways relevant to powered flight, including bone metabolism, bone fusion, muscle development and hyperglycemia levels. Genes encoding proteins involved in bone resorption, such as TPP1, had a high number of sites under Darwinian selection in birds. Conclusions Patterns of positive selection observed in bird ossification genes suggest that there was a period of intense selective pressure to improve flight efficiency that was closely linked with constraints on body size

Copenhagen University Research Information System

arXiv.org e-Print Archive

NSU Works

Investigating the relative influence of genetic drift and natural selection in shaping patterns of population structure in Delphinids (Delphinus delphis; Tursiops spp.)

Author: MOURA ANDRE,EURICO,VIOLA
Publication venue
Publication date: 01/01/2010
Field of study

Speciation models relying on geographic barriers to limit gene flow gather widespread consensus, but are insufficient to explain diversification in highly mobile marine organisms. Adaptation to different environments has been suggested as an alternative driver for differentiation, particularly in cetaceans. In this study, patterns of population structure at neutral and functional markers were investigated for both common (Delphinus delphis) and bottlenose dolphin (Tursiops spp.), chosen due to high levels of morphological and ecological variation within each genus. Candidate functional markers were selected by investigating signals of positive selection in both mammals and cetaceans. No population structure was found in the European common dolphin for neutral microsatellite loci, in contrast to what is observed in other sympatric cetacean species. The previously described differention of the Eastern Mediterranean Sea population, probably results from a recent human-mediated bottleneck. Functional markers showed almost complete uniformity suggesting purifying selection. One non-synonymous mutation in β-casein and the DQβ1 locus were exceptions, with patterns of population differentiation possibly the result of differences in local selective pressures. Additionally, large mitogenomic sequences were used to investigate the worldwide phylogeography of several ecotypes/species within the genus Tursiops, with a recent biogeographical calibration point being used to calculate divergence times. Good node resolution with high statistical support was achieved, with good separation between most ecotypes in their own lineages. However, the results give no support for a monophiletic Tursiops. Divergence times are clustered in specific geological periods characterized by climatic fluctuations from cold to warmer periods. The Common and bottlenose dolphins exhibit contrasting patterns of population structure in an environment containing few geographical barriers. Such difference is speculated to be related with different feeding ecologies and social structures, although data on such are still limited. Although selection can be detected in the genomes of cetaceans both at the species and population level, current patterns of differentiation are thought to occur mainly due to drift

Durham e-Theses

Weighted Statistical Binning: enabling statistically consistent genome-scale phylogenetic analyses

Author: Bayzid Md. Shamsuzzoha
Boussau Bastien
Mirarab Siavash
Warnow Tandy
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 03/06/2015
Field of study

Because biological processes can make different loci have different evolutionary histories, species tree estimation requires multiple loci from across the genome. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called summary methods. Because summary methods are generally fast, they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate on biologically realistic conditions. Mirarab et al. (Science 2014) presented the statistical binning technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple statistical test for combinability and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomics pipeline does not have the desirable property of being statistically consistent. We show that weighting the recalculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistical consistent under the multi-species coalescent model.Comment: (1) In Press, PLoS ON

Directory of Open Access Journals

INRIA a CCSD electronic archive server

HAL Descartes

Molecular evolution of bovine Toll-like receptor 2 suggests substitutions of functional relevance

Author: Chang J S
Glass E J
Haig D
Jann O C
Moredun
Nottingham
Roslin
Royal (Dick) School of Veterinary Studies
Werling D
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background There is accumulating evidence that polymorphism in Toll-like receptor (<it>TLR) </it>genes might be associated with disease resistance or susceptibility traits in livestock. Polymorphic sites affecting TLR function should exhibit signatures of positive selection, identified as a high ratio of non-synonymous to synonymous nucleotide substitutions (ω). Phylogeny based models of codon substitution based on estimates of ω for each amino acid position can therefore offer a valuable tool to predict sites of functional relevance. We have used this approach to identify such polymorphic sites within the bovine <it>TLR2 </it>genes from ten <it>Bos indicus </it>and <it>Bos taurus </it>cattle breeds. By analysing <it>TLR2 </it>gene phylogeny in a set of mammalian species and a subset of ruminant species we have estimated the selective pressure on individual sites and domains and identified polymorphisms at sites of putative functional importance. Results The ω were highest in the mammalian TLR2 domains thought to be responsible for ligand binding and lowest in regions responsible for heterodimerisation with other TLR-related molecules. Several positively-selected sites were detected in or around ligand-binding domains. However a comparison of the ruminant subset of <it>TLR2 </it>sequences with the whole mammalian set of sequences revealed that there has been less selective pressure among ruminants than in mammals as a whole. This suggests that there have been functional changes during ruminant evolution. Twenty newly-discovered non-synonymous polymorphic sites were identified in cattle. Three of them were localised at positions shaped by positive selection in the ruminant dataset (Leu227Phe, His305Pro, His326Gln) and in domains involved in the recognition of ligands. His326Gln is of particular interest as it consists of an exchange of differentially-charged amino acids at a position which has previously been shown to be crucial for ligand binding in human TLR2. Conclusion Within bovine TLR2, polymorphisms at amino acid positions 227, 305 and 326 map to functionally important sites of TLR2 and should be considered as candidate SNPs for immune related traits in cattle. A final proof of their functional relevance requires further studies to determine their functional effect on the immune response after stimulation with relevant ligands and/or their association with immune related traits in animals.</p

Directory of Open Access Journals