Search CORE

11 research outputs found

Systematic Exploration of the High Likelihood Set of Phylogenetic Tree Topologies.

Author: Claywell BC
Fisher T
Fourment M
Magee AF
Matsen FA
Whidden C
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/03/2020
Field of study

Bayesian Markov chain Monte Carlo explores tree space slowly, in part because it frequently returns to the same tree topology. An alternative strategy would be to explore tree space systematically, and never return to the same topology. In this article, we present an efficient parallelized method to map out the high likelihood set of phylogenetic tree topologies via systematic search, which we show to be a good approximation of the high posterior set of tree topologies on the data sets analyzed. Here, "likelihood" of a topology refers to the tree likelihood for the corresponding tree with optimized branch lengths. We call this method "phylogenetic topographer" (PT). The PT strategy is very simple: starting in a number of local topology maxima (obtained by hill-climbing from random starting points), explore out using local topology rearrangements, only continuing through topologies that are better than some likelihood threshold below the best observed topology. We show that the normalized topology likelihoods are a useful proxy for the Bayesian posterior probability of those topologies. By using a nonblocking hash table keyed on unique representations of tree topologies, we avoid visiting topologies more than once across all concurrent threads exploring tree space. We demonstrate that PT can be used directly to approximate a Bayesian consensus tree topology. When combined with an accurate means of evaluating per-topology marginal likelihoods, PT gives an alternative procedure for obtaining Bayesian posterior distributions on phylogenetic tree topologies

OPUS - University of Technology Sydney

On Defining and Finding Islands of Trees and Mitigating Large Island Bias

Author: De Almeida Serra Jorge Da Silva Ana
Wilkinson Mark
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2021
Field of study

Explore Bristol Research

Recommended from our members

Enabling comparative genomics at the scale of hundreds of species

Author: Armstrong Joel
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Comparing related (homologous) subsequences between genomes from different species gives insight into their function. This information is captured in ``genome alignments'', which are essential for almost all comparative genomics analyses. However, most existing methods to create a genome alignment suffer from reference-bias (where only one genome is fully aligned to all others), or ignore duplication events. Though the Cactus genome aligner avoided these restrictions, it could not align more than a few genomes without becoming cost-prohibitive as well as losing accuracy. I developed and refined a “progressive alignment” extension to Cactus to allow it to produce a full alignment in time linear in the number of input genomes while maintaining similar, or often improved, quality. This new method allows Cactus to align hundreds of large vertebrate genomes---enabling comparative genomics at an unprecedented scale. During its development I used Cactus as an essential component of several successful comparative genomics projects. Working closely with the 200 Mammals and Bird 10K projects, I have used Cactus to create an alignment of over 600 bird and mammal genomes, which is by far the largest genome alignment ever created. Finally, I have utilized this alignment to provide a highest-possible-resolution annotation of mammalian and avian evolutionary constraint, using the uniquely large number of taxa to enable the examination of weak effects of purifying selection

eScholarship - University of California

Post-processing of phylogenetic trees

Author: Silva Ana Serra
Publication venue
Publication date: 06/12/2022
Field of study

Explore Bristol Research

Dynamic homology and phylogenetic systematics: a unified approach using POY

Author: Aagesen Lone
Arango Claudia P.
D’Haese Cyrille
Faivovich Julián
Giribet Gonzalo
Grant Taran
Janies Daniel
Smith William Leo
Varón Andrés
Wheeler Ward C.
Publication venue: 'American Museum of Natural History (BioOne sponsored)'
Publication date: 01/01/2006
Field of study

KU ScholarWorks

Méthodes et algorithmes pour l’amélioration de l’inférence de l’histoire évolutive des génomes

Author: Noutahi Finagnon Marc-Rolland Emmanuel
Publication venue
Publication date: 01/07/2018
Field of study

Les phylogénies de gènes offrent un cadre idéal pour l’étude comparative des génomes. Non seulement elles incorporent l’évolution des espèces par spéciation, mais permettent aussi de capturer l’expansion et la contraction des familles de gènes par gains et pertes de gènes. La détermination de l’ordre et de la nature de ces événements équivaut à inférer l’histoire évolutive des familles de gènes, et constitue un prérequis à plusieurs analyses en génomique comparative. En effet, elle est requise pour déterminer efficacement les relations d’orthologies entre gènes, importantes pour la prédiction des structures et fonctions de protéines et les analyses phylogénétiques, pour ne citer que ces applications. Les méthodes d’inférence d’histoires évolutives de familles de gènes supposent que les phylogénies considérées sont dénuées d’erreurs. Ces phylogénies de gènes, souvent recons- truites à partir des séquences d’acides aminés ou de nucléotides, ne représentent cependant qu’une estimation du vrai arbre de gènes et sont sujettes à des erreurs provenant de sources variées, mais bien documentées. Pour garantir l’exactitude des histoires inférées, il faut donc s’assurer de l’absence d’erreurs au sein des arbres de gènes. Dans cette thèse, nous étudions cette problématique sous deux aspects. Le premier volet de cette thèse concerne l’identification des déviations du code génétique, l’une des causes d’erreurs d’annotations se propageant ensuite dans les phylogénies. Nous développons à cet effet, une méthodologie pour l’inférence de déviations du code génétique standard par l’analyse des séquences codantes et des ARNt. Cette méthodologie est cen- trée autour d’un algorithme de prédiction de réaffectations de codons, appelé CoreTracker. Nous montrons tout d’abord l’efficacité de notre méthode, puis l’utilisons pour démontrer l’évolution du code génétique dans les génomes mitochondriaux des algues vertes. Le second volet de la thèse concerne le développement de méthodes efficaces pour la correction et la construction d’arbres phylogénétiques de gènes. Nous présentons deux méthodes exploitant l’information sur l’évolution des espèces. La première, ProfileNJ , est déterministe et très rapide. Elle corrige les arbres de gènes en ciblant exclusivement les sous-arbres présentant un support statistique faible. Son application sur les familles de gènes d’Ensembl Compara montre une amélioration nette de la qualité des arbres, par comparaison à ceux proposés par la base de données. La seconde, GATC, utilise un algorithme génétique et traite le problème comme celui de l’optimisation multi-objectif de la topologie des arbres de gènes, étant données des contraintes relatives à l’évolution des familles de gènes par mutation de séquences et par gain/perte de gènes. Nous montrons qu’une telle approche est non seulement efficace, mais appropriée pour la construction d’ensemble d’arbres de référence.Gene trees offer a proper framework for comparative genomics. Not only do they provide information about species evolution through speciation events, but they also capture gene family expansion and contraction by gene gains and losses. They are thus used to infer the evolutionary history of gene families and accurately predict the orthologous relationship between genes, on which several biological analyses rely. Methods for inferring gene family evolution explicitly assume that gene trees are known without errors. However, standard phylogenetic methods for tree construction based on se- quence data are well documented as error-prone. Gene trees constructed using these methods will usually introduce biases during the inference of gene family histories. In this thesis, we present new methods aiming to improve the quality of phylogenetic gene trees and thereby the accuracy of underlying evolutionary histories of their corresponding gene families. We start by providing a framework to study genetic code deviations, one possible reason of annotation errors that could then spread to the phylogeny reconstruction. Our framework is based on analysing coding sequences and tRNAs to predict codon reassignments. We first show its efficiency, then apply it to green plant mitochondrial genomes. The second part of this thesis focuses on the development of efficient species tree aware methods for gene tree construction. We present ProfileNJ , a fast and deterministic correction method that targets weakly supported branches of a gene tree. When applied to the gene families of the Ensembl Compara database, ProfileNJ produces an arguably better set of gene trees compared to the ones available in Ensembl Compara. We later use a different strategy, based on a genetic algorithm, allowing both construction and correction of gene trees. This second method called GATC, treats the problem as a multi-objective optimisation problem in which we are looking for the set of gene trees optimal for both sequence data and information of gene family evolution through gene gain and loss. We show that this approach yields accurate trees and is suitable for the construction of reference datasets to benchmark other methods

Dépôt Institutionnel Numérique

Evolutionary genomics : statistical and computational methods

Author: Anisimova Maria
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward

ZHAW digitalcollection

Directory of Open Access Books (DOAB)