9 research outputs found

    IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era

    Get PDF
    IQ-TREE (http://www.iqtree.org, last accessed February 6, 2020) is a user-friendly and widely used software package for phylogenetic inference using maximum likelihood. Since the release of version 1 in 2014, we have continuously expanded IQ-TREE to integrate a plethora of new models of sequence evolution and efficient computational approaches of phylogenetic inference to deal with genomic data. Here, we describe notable features of IQ-TREE version 2 and highlight the key advantages over other software.This work was supported by the Austrian Science Fund (Grant No. I-2805-B29) to A.v.H. and by the Australian National University Futures Scheme grant to R.L

    Phylogenomics

    No full text
    In der Evolutionsbiologie wurde in den letzten Jahren durch Aufkommen der Phylogenomik ein neues Kapitel aufgeschlagen. Mit Hilfe genomweiter Analysen wird es zunehmend möglich, schwierige Fragestellungen zur Evolutionsgeschichten zu untersuchen, doch diese neuen Perspektiven sind gleichzeitig verbunden mit erhöhter Komplexität und anderen Schwierigkeiten bei der Stammbaumberechnung. Der Fokus dieser Dissertation ist es, Einblicke in Fragen aus der Theorie der Baumrekonstruktion sowie zu Algorithmen und Anwendungen der Phylogenomik zu geben. Ein Hauptaugenmerk liegt hierbei auf der Untersuchung phylogenetischer Terrassen, dies Speziesbäume mit gleichen Maximum Likelihood oder Maximum Parsimonie Werten. Ein Problem bei der Suche nach optimalen Bäumen ist die Größe der Terrasse. Die Größe der Terrassen hängt dabei entscheidend von dem untersuchten multiple Sequenz Alignment ab. Im Rahmen dieser Arbeit wird zunächst erklärt wie man Terrassen während der Baumsuche detektiert. Dazu studieren wir durch Partitionen induzierte Bäume und in welcher Weise topologische Umformungen am Speziesbaum Änderungen an den Partitionsbäumen bedingen. Sollte eine Änderung der Verzweigungsstruktur eines Speziesbaum keine Änderung an den mit ihm assoziierten induzierten Partitionsbäumen hervorrufen, gehören sowohl der gegenwärtige als auch der geänderte neue Speziesbaum zur selben Terrasse. Es werden drei Propositionen bewiesen, die Kriterien definieren nach welchen verschiedene Umformungsoperationen, namentlich NNI (Nearest Neighbour Interchange), SPR (Subtree Pruning and Regrafting) und TBR (Tree Bisection and Reconnection), sich die induzierten Partitionsbäume verändern. Weiters wird das Konzept von Terrassen erweitert, in dem partielle Terrassen definiert werden und ihr Auftreten für echte Alignments unter NNI Umformungsoperationen untersucht wird. Im zweiten Teil wird die Datenstruktur, PTA, (phylogenetic terrace aware data structure) vorgestellt, die eine effiziente Analyse verknüpfter multipler Alignments unter Berücksichtigung phylogenetischer Terrassen ermöglicht. Mit Hilfe von PTA und den Kriterien zur Erfassung (partieller) Terrassen ist es möglich, überflüssige Neuberechnungen der Maximum Likelihood oder Maximum Parsimonie Werte zu vermeiden und so die für die Baumsuche benötigte Rechenzeit zu verringern. Durch die Identifizierung partieller Terrassen wird im Vergleich zur Standardimplementierung eine bis zu 5-fache Beschleunigung von IQ-TREE festgestellt und nach der Implementierung der Terrassenidentifikation ist IQ-TREE in der Lage bis zu 6 Mal schneller Maximum-Likelihood-Bäume zu finden als RAxML. Die Datenstruktur PTA eignet sich für den Einsatz mit allen Partitionsmodellen und für alle üblichen topologischen Umformungen wie NNI, SPR und TBR. Im Schlussteil dieser Arbeit werden Methoden für den Einsatz in Naturschutzbiologie und -ökologie eingeführt und diskutiert, wobei Phylogenomik herangezogen wird, um die evolutionäre Diversität verschiedener Spezies zu quantifizieren. Wir diskutieren die Aufgabe der Auswahl überlebensfähiger Taxa, ein Optimierungsproblem unter Einbeziehung von Räuber-Beute-Interaktionen. Zuerst wird dabei der Rahmen der Aufgabenstellung erweitert, um auch die Splitdiversität (SD) zu erfassen, ein Biodiversitätsmaß welches auf der evolutionären Distanz zwischen verschiedenen Spezies in Splitnetzwerken basiert. Danach erweitern wir die Definition von Lebensfähigkeit um die Nahrungszusammensetzung des Räubers miteinzubeziehen und so die Modellierung realistischer zu gestalten. Mit Hilfe der SD und unter Berücksichtigung eines realistischen Modells werden Spezies für Naturschutzmaßnahmen priorisiert. Obwohl derartige Optimierungsaufgaben in den Bereich NP-schwerer Probleme fallen, zeige ich, dass sie mit Hilfe von ILP (Integer Linear Programming) in überschaubarer Zeit gelöst werden können. In dieser Arbeit werden ILP-Ansätze für alle darin diskutierten Problemstellungen beschrieben sowie eine Implementierung im Software Paket PDA bereitgestellt.In the recent years phylogenomics opened a new chapter in evolutionary biology. The analysis of the genome-scale data sets has the potential of answering the most difficult and intriguing questions for evolutionary histories. However, such perspectives come with a higher complexity and difficulties for the phylogenomic inference. The focus of this thesis is exploring and providing insights into some of the questions arisen from theory, algorithms and applications of phylogenomics. The main contributions of the thesis deal with phylogenetic terraces, which represent sets of species trees in tree space with identical score (likelihood or parsimony). Firstly, we provide the rules to detect terraces during the tree search. To this end we study the induced partition trees and how topological rearrangements on species tree drive changes on partition trees. If the tree rearrangement operation applied to the current species tree does not change any of its associated induced partition trees, then the current and a new species trees belong to one terrace. We proof three propositions defining the rules when Nearest Neighbour Interchange (NNI), Subtree Pruning and Regrafting (SPR) and Tree Bisection and Reconnection (TBR) operations change the induced partition trees. We further generalize the concept of terraces to partial terraces and study their occurrence for real alignments using NNI neighbourhoods. Secondly, we provide a phylogenetic terrace aware data structure (PTA) for the efficient analysis of concatenated multiple alignments. Using PTA and the rules developed to detect (partial) terraces in the presence of missing data one saves computational time by avoiding unnecessary recomputations. We implemented PTA in IQ-TREE and tested its performance on 11 real alignments. Identification of partial terraces speeded up the tree search with IQ-TREE for up to 5 and 6 times compared to the standard implementation (terrace-unaware) and RAxML, respectively. PTA is suitable for the use with all partition models and all common topological rearrangement operations, such as NNI, SPR and TBR. Finally, we develop methods for conservation biology and ecology, where phylogenomics is used to quantify the evolutionary diversity of the species. We discuss the viable taxon selection problem, which incorporates predator-prey interactions to define viability constraints. First, we extend the problem to account for Split Diversity (SD), a biodiversity measure, which is based on the evolutionary distances between species on split networks. Second, to make the viability constraints more realistic we extend the viability definition to account for the diet composition of predators. SD with the viability constraints is used to prioritize species for the conservation actions. Though such optimization problems fall into the area of NP-hard problems, it is possible to solve them within reasonable amount of time using Integer Linear Programming (ILP), a well-known method for the decision-making problems. We provide the ILP formulations for all the discussed problems and implement them in the PDA software package. To exemplify the discussed methods we apply them to a real case study – the Caribbean Coral Reef community

    Data from: Terrace aware data structure for phylogenomic inference from supermatrices

    No full text
    In phylogenomics the analysis of concatenated gene alignments, the so-called supermatrix, is commonly accompanied by the assumption of partition models. Under such models each gene, or more generally partition, is allowed to evolve under its own evolutionary model. Though partition models provide a more comprehensive analysis of supermatrices, missing data may hamper the tree search algorithms due to the existence of phylogenetic (partial) terraces. Here we introduce the phylogenetic terrace aware (PTA) data structure for the efficient analysis under partition models. In the presence of missing data PTA exploits (partial) terraces and induced partition trees to save computation time. We show that an implementation of PTA in IQ-TREE leads to a substantial speedup of up to 4.5 and 8 times compared with the standard IQ-TREE and RAxML implementations, respectively. PTA is generally applicable to all types of partition models and common topological rearrangements thus can be employed by all phylogenomic inference software

    Complex evolution of light-dependent protochlorophyllide oxidoreductases in aerobic anoxygenic phototrophs: origin, phylogeny and function

    No full text
    Light-dependent and dark-operative protochlorophyllide oxidoreductases (LPORs and DPORs) are evolutionary and structurally distinct enzymes that are essential for the synthesis of (bacterio)chlorophyll, the primary pigment needed for both anoxygenic and oxygenic photosynthesis. In contrast to the long-held hypothesis that LPORs are only present in oxygenic phototrophs, we recently identified a functional LPOR in the aerobic anoxygenic phototrophic bacterium (AAPB) Dinoroseobacter shibae, and attributed its presence to a single horizontal gene transfer (HGT) event from cyanobacteria. Here, we provide evidence for the more widespread presence of genuine LPOR enzymes in AAPBs. An exhaustive bioinformatics search identified 36 putative LPORs outside of oxygenic phototrophic bacteria (cyanobacteria) with the majority being AAPBs. Using in vitro and in vivo assays, we show that the large majority of the tested AAPB enzymes are genuine LPORs. Solution structural analyses, performed for two of the AAPB LPORs, revealed a globally conserved structure when compared to a well-characterized cyanobacterial LPOR. Phylogenetic analyses suggest that LPORs were transferred not only from cyanobacteria, but also subsequently between proteobacteria and from proteobacteria to Gemmatimonadetes. Our study thus provides another interesting example for the complex evolutionary processes that govern the evolution of bacteria, involving multiple HGT events that likely occurred at different time points and involved different donors

    Complex Evolution of Light-Dependent Protochlorophyllide Oxidoreductases in Aerobic Anoxygenic Phototrophs: Origin, Phylogeny, and Function

    No full text
    Light-dependent and dark-operative protochlorophyllide oxidoreductases (LPORs and DPORs) are evolutionary and structurally distinct enzymes that are essential for the synthesis of (bacterio)chlorophyll, the primary pigment needed for both anoxygenic and oxygenic photosynthesis. In contrast to the long-held hypothesis that LPORs are only present in oxygenic phototrophs, we recently identified a functional LPOR in the aerobic anoxygenic phototrophic bacterium (AAPB) Dinoroseobacter shibae, and attributed its presence to a single horizontal gene transfer (HGT) event from cyanobacteria. Here, we provide evidence for the more widespread presence of genuine LPOR enzymes in AAPBs. An exhaustive bioinformatics search identified 36 putative LPORs outside of oxygenic phototrophic bacteria (cyanobacteria) with the majority being AAPBs. Using in vitro and in vivo assays, we show that the large majority of the tested AAPB enzymes are genuine LPORs. Solution structural analyses, performed for two of the AAPB LPORs, revealed a globally conserved structure when compared to a well-characterized cyanobacterial LPOR. Phylogenetic analyses suggest that LPORs were transferred not only from cyanobacteria, but also subsequently between proteobacteria and from proteobacteria to Gemmatimonadetes. Our study thus provides another interesting example for the complex evolutionary processes that govern the evolution of bacteria, involving multiple HGT events that likely occurred at different time points and involved different donors

    Molecular Electrical Properties from Quantum Monte Carlo Calculations: Application to Ethyne

    No full text
    We used Quantum Monte Carlo (QMC) methods to study the polarizability and the quadrupole moment of the ethyne molecule using the Jastrow-Antisymmetrised Geminal Power (JAGP) wave function, a compact and strongly correlated variational ansatz. The compactness of the functional form and the full optimization of all its variational parameters, including linear and exponential coefficients in atomic orbitals, allow us to observe a fast convergence of the electrical properties with the size of the atomic and Jastrow basis sets. Both variational results on isotropic polarizability and quadrupole moment based on Gaussian type and Slater type basis sets are very close to the Lattice Regularized Diffusion Monte Carlo values and in very good agreement with experimental data and with other quantum chemistry calculations. We also study the electronic density along the C 61C and C\u2013H bonds by introducing a generalization for molecular systems of the small-variance improved estimator of the electronic density proposed by Assaraf et al. (Assaraf, R.; Caffarel, M.; Scemama, A. Phys. Rev. E, 2007, 75, 035701)
    corecore