22 research outputs found

    MPI-PHYLIP: Parallelizing Computationally Intensive Phylogenetic Analysis Routines for the Analysis of Large Protein Families

    Get PDF
    Background: Phylogenetic study of protein sequences provides unique and valuable insights into the molecular and genetic basis of important medical and epidemiological problems as well as insights about the origins and development of physiological features in present day organisms. Consensus phylogenies based on the bootstrap and other resampling methods play a crucial part in analyzing the robustness of the trees produced for these analyses. Methodology: Our focus was to increase the number of bootstrap replications that can be performed on large protein datasets using the maximum parsimony, distance matrix, and maximum likelihood methods. We have modified the PHYLIP package using MPI to enable large-scale phylogenetic study of protein sequences, using a statistically robust number of bootstrapped datasets, to be performed in a moderate amount of time. This paper discusses the methodology used to parallelize the PHYLIP programs and reports the performance of the parallel PHYLIP programs that are relevant to the study of protein evolution on several protein datasets. Conclusions: Calculations that currently take a few days on a state of the art desktop workstation are reduced to calculations that can be performed over lunchtime on a modern parallel computer. Of the three protein methods tested, the maximum likelihood method scales the best, followed by the distance method, and then the maximum parsimony method. However, the maximum likelihood method requires significant memory resources, which limits its application to mor

    In search of lost introns

    Full text link
    Many fundamental questions concerning the emergence and subsequent evolution of eukaryotic exon-intron organization are still unsettled. Genome-scale comparative studies, which can shed light on crucial aspects of eukaryotic evolution, require adequate computational tools. We describe novel computational methods for studying spliceosomal intron evolution. Our goal is to give a reliable characterization of the dynamics of intron evolution. Our algorithmic innovations address the identification of orthologous introns, and the likelihood-based analysis of intron data. We discuss a compression method for the evaluation of the likelihood function, which is noteworthy for phylogenetic likelihood problems in general. We prove that after O(nL)O(nL) preprocessing time, subsequent evaluations take O(nL/logL)O(nL/\log L) time almost surely in the Yule-Harding random model of nn-taxon phylogenies, where LL is the input sequence length. We illustrate the practicality of our methods by compiling and analyzing a data set involving 18 eukaryotes, more than in any other study to date. The study yields the surprising result that ancestral eukaryotes were fairly intron-rich. For example, the bilaterian ancestor is estimated to have had more than 90% as many introns as vertebrates do now

    Evaluation of properties over phylogenetic trees using stochastic logics

    Get PDF
    Background: Model checking has been recently introduced as an integrated framework for extracting information of the phylogenetic trees using temporal logics as a querying language, an extension of modal logics that imposes restrictions of a boolean formula along a path of events. The phylogenetic tree is considered a transition system modeling the evolution as a sequence of genomic mutations (we understand mutation as different ways that DNA can be changed), while this kind of logics are suitable for traversing it in a strict and exhaustive way. Given a biological property that we desire to inspect over the phylogeny, the verifier returns true if the specification is satisfied or a counterexample that falsifies it. However, this approach has been only considered over qualitative aspects of the phylogeny. Results: In this paper, we repair the limitations of the previous framework for including and handling quantitative information such as explicit time or probability. To this end, we apply current probabilistic continuous-time extensions of model checking to phylogenetics. We reinterpret a catalog of qualitative properties in a numerical way, and we also present new properties that couldn't be analyzed before. For instance, we obtain the likelihood of a tree topology according to a mutation model. As case of study, we analyze several phylogenies in order to obtain the maximum likelihood with the model checking tool PRISM. In addition, we have adapted the software for optimizing the computation of maximum likelihoods. Conclusions: We have shown that probabilistic model checking is a competitive framework for describing and analyzing quantitative properties over phylogenetic trees. This formalism adds soundness and readability to the definition of models and specifications. Besides, the existence of model checking tools hides the underlying technology, omitting the extension, upgrade, debugging and maintenance of a software tool to the biologists. A set of benchmarks justify the feasibility of our approach

    Reverse dissimilatory sulfite reductase as phylogenetic marker for a subgroup of sulfur-oxidizing prokaryotes

    Get PDF
    Sulfur-oxidizing prokaryotes (SOP) catalyse a central step in the global S-cycle and are of major functional importance for a variety of natural and engineered systems, but our knowledge on their actual diversity and environmental distribution patterns is still rather limited. In this study we developed a specific PCR assay for the detection of dsrAB that encode the reversely operating sirohaem dissimilatory sulfite reductase (rDSR) and are present in many but not all published genomes of SOP. The PCR assay was used to screen 42 strains of SOP (most without published genome sequence) representing the recognized diversity of this guild. For 13 of these strains dsrAB was detected and the respective PCR product was sequenced. Interestingly, most dsrAB-encoding SOP are capable of forming sulfur storage compounds. Phylogenetic analysis demonstrated largely congruent rDSR and 16S rRNA consensus tree topologies, indicating that lateral transfer events did not play an important role in the evolutionary history of known rDSR. Thus, this enzyme represents a suitable phylogenetic marker for diversity analyses of sulfur storage compound-exploiting SOP in the environment. The potential of this new functional gene approach was demonstrated by comparative sequence analyses of all dsrAB present in published metagenomes and by applying it for a SOP census in selected marine worms and an alkaline lake sediment

    Endomicrobia in termite guts: symbionts within a symbiont (Phylogeny, cospeciation with host flagellates, and preliminary genome analysis)

    Get PDF
    "Endomicrobia" are intracellular symbionts of termite gut flagellates that represent a distinct lineage in the novel bacterial phylum Termite Group I (TG-1). The evolutionary history of "Endomicrobia" with respect to their symbiosis with host flagellates was investigated using phylogenetic analyses and in situ identification based on small-subunit ribosomal RNA (SSU rRNA) sequences. By analyzing SSU rRNA sequences extracted from manually separated flagellate cells, "Endomicrobia" were shown to be widely distributed among termite gut flagellates. Symbionts originating from the same genus of flagellates invariably formed a host-specific monophyletic cluster in the phylogenetic tree. Their intracellular location in the cytoplasm was confirmed by fluorescent in situ hybridization (FISH) using oligonucleotide probes designed specifically for each symbiont and for the host. The phylogeny of "Endomicrobia" and host flagellates belonging to the parabasalid genus Trichonympha was investigated further in detail. SSU rRNA trees of the symbiont and the host exhibited overall congruence, which suggested cospeciation. Pairwise distance analysis and FISH corroborated the phylogenetic evidence, and these results allowed the construction of evolutionary scenarios for the divergence of "Endomicrobia" and their acquisition by flagellate hosts. "Endomicrobia" share their intracellular habitat with other bacterial symbionts. Bacterial SSU rRNA sequences originating from Trichonympha flagellates of Incisitermes marginipennis and Zootermopsis nevadensis revealed the occurrence of several distinct phylogenetic groups, representing Treponema spp., "Endomicrobia", δ-Proteobacteria, Bacteroidetes, and Mycoplasmatales. The Proteobacteria symbionts were shown to densely colonize the surface and the cytoplasm of the flagellates in high abundance. Since no pure cultures of "Endomicrobia" or their host flagellates are available, a method for the physical enrichment of "Endomicrobia" was established to gain more insights into the nature of these symbionts. "Candidatus Endomicrobium trichonymphae" (CET), the symbiont of Trichonympha flagellates, was selected as representative and enriched from gut contents of Z. nevadensis. High-molecular-weight DNA extracted from the enrichment is currently used for genome sequencing at the DOE Joint Genome Institute. A recently assembled 80-kb contig of CET revealed first insights into its metabolism, including hexuronate metabolism and the possible formation of H2. "Endomicrobia" are also present in the gut of the wood-feeding cockroach Cryptocercus punctulatus, which is considered to share a common ancestor with termites. Analysis of SSU rRNA sequences obtained from whole-gut DNA of this cockroach revealed the phylogenetic positions of six lineages (morphotypes) of parabasalid flagellates. Sequences obtained from manually isolated flagellates, which have long been assigned to the genus Trichonympha, turned out to be a previously undescribed lineage of Parabasalia. Since this new lineage may represent one of the earliest branches of parabasalid flagellates, the recovery of “Endomicrobia" sequences also from this flagellate underscores the presence of these endosymbionts already in the flagellates of the hypothetical dictyopteran ancestor of termites and cockroaches. The results of this study collectively document that "Endomicrobia" are prevalent and persistent endosymbionts of termite gut flagellates. This study also provides a better understanding of the phylogenetic properties of their biotic environment, i.e., the host flagellates and the cohabiting bacteria, which may help to explain the functional roles of "Endomicrobia" and their symbiotic interactions

    Inference of Many-Taxon Phylogenies

    Get PDF
    Phylogenetic trees are tree topologies that represent the evolutionary history of a set of organisms. In this thesis, we address computational challenges related to the analysis of large-scale datasets with Maximum Likelihood based phylogenetic inference. We have approached this using different strategies: reduction of memory requirements, reduction of running time, and reduction of man-hours

    Bacterial symbionts of termite gut flagellates: cospeciation and nitrogen fixation in the gut of dry-wood termites

    Get PDF
    The subject of this thesis is the symbiosis between flagellates and bacteria in the gut of dry-wood termites (Kalotermitidae). In a series of studies, the evolution of devescovinid flagellates and their bacterial symbionts was elucidated, and the physiological basis of the symbiosis was investigated, with a focus on nitrogen fixation. Devescovinid flagellates are the dominant flagellates in the gut of Kalotermitidae. Species-pure suspensions of devescovinid flagellates (Devescovina and Metadevescovina species) from a wide range of termite species in the family Kalotermitidae were isolated with micropipettes. Ribosomal RNA gene sequences of the host flagellates and their bacterial symbionts were obtained using a full-cycle-rRNA approach. Phylogenetic analysis showed that Devescovina spp. present in many species of Kalotermitidae form a monophyletic group. They were consistently associated with a distinct lineage of ectosymbionts, which form a monophyletic group among the Bacteroidales. The well-supported congruence of their phylogenies documented strict cospeciation of flagellates and their ectosymbionts, which were temporarily classified as “Candidatus Armantifilum devescovinae”. Nevertheless, the complete incongruence between the phylogenies of devescovinid flagellates and Kalotermitidae (COII genes) demonstrated horizontal transfer of flagellates among several species of Kalotermitidae. The presence of filamentous “A. devescovinae” on the surface of Devescovina spp. was corroborated with scanning electron microscopy and fluorescent in situ hybridization. However, several Metadevescovina species, which form a sister group of Devescovina spp., did not possess Bacteroidales ectosymbionts. Moreover, a combination of molecular analysis and electron microscopy led to a correction of the previously overestimated diversity of Metadevescovina species in the gut of termite Incisitermes marginipennis. In contrast to the Bacteroidales ectosymbionts, the endosymbionts of Devescovina spp., which belong to the so-called “Endomicrobia” (TG-1 phylum) and consistently colonized the cytoplasm of all flagellates of this group, were clearly polyphyletic. This suggested that they were acquired independently by each host species. The same seems to be true for the Bacteroidales ectosymbionts of the Oxymonas flagellates present in several Kalotermitidae. These ectosymbionts form several distantly related novel lineages in the phylogenetic tree, underscoring the notion that evolutionary histories of flagellate–bacteria symbioses in the termite gut are complex. Kalotermitidae are known to fix large amounts of atmospheric nitrogen, and acetylene reduction assay showed the presence of nitrogenase activity in the gut of these termites. Community fingerprinting of the nitrogenase genes (homologs of nifH) by T-RFLP analysis revealed that a gene encoding an alternative nitrogenase (anfH) of unknown origin was most highly expressed homolog in mRNA-based profiles. Cloning of the nifH homologs from capillary-picked suspensions of Devescovina arta and Snyderella tabogae gave strong evidence that the “A. devescovinae” are the putative carriers of the anfH gene and therefore responsible for most of the nitrogen-fixing activity in the guts of Neotermes castaneus and Cryptotermes longicollis. Despite a high diversity of nifH homologs in gut homogenates, the only other homologs that were expressed belonged to Treponema, Bacteroidales (nifH), and the proteo-cyano group. The gene expression profiles were specific for the termites. The anfH genes were not expressed in termite species that accumulated large amounts of hydrogen (35–45 kPa, microsensor measurements), suggesting a repression of gene expression by high hydrogen partial pressure

    Formal methods applied to the analysis of phylogenies: Phylogenetic model checking

    Get PDF
    Los árboles filogenéticos son abstracciones útiles para modelar y caracterizar la evolución de un conjunto de especies o poblaciones respecto del tiempo. La proposición, verificación y generalización de hipótesis sobre un árbol filogenético inferido juegan un papel importante en el estudio y comprensión de las relaciones evolutivas. Actualmente, uno de los principales objetivos científicos es extraer o descubrir los mensajes biológicos implícitos y las propiedades estructurales subyacentes en la filogenia. Por ejemplo, la integración de información genética en una filogenia ayuda al descubrimiento de genes conservados en todo o parte del árbol, la identificación de posiciones covariantes en el ADN o la estimación de las fechas de divergencia entre especies. Consecuentemente, los árboles ayudan a comprender el mecanismo que gobierna la deriva evolutiva. Hoy en día, el amplio espectro de métodos y herramientas heterogéneas para el análisis de filogenias enturbia y dificulta su utilización, además del fuerte acoplamiento entre la especificación de propiedades y los algoritmos utilizados para su evaluación (principalmente scripts ad hoc). Este problema es el punto de arranque de esta tesis, donde se analiza como solución la posibilidad de introducir un entorno formal de verificación de hipótesis que, de manera automática y modular, estudie la veracidad de dichas propiedades definidas en un lenguaje genérico e independiente (en una lógica formal asociada) sobre uno de los múltiples softwares preparados para ello. La contribución principal de la tesis es la propuesta de un marco formal para la descripción, verificación y manipulación de relaciones causales entre especies de forma independiente del código utilizado para su valoración. Para ello, exploramos las características de las técnicas de model checking, un paradigma en el que una especificación expresada en lógica temporal se verifica con respecto a un modelo del sistema que representa una implementación a un cierto nivel de detalle. Se ha aplicado satisfactoriamente en la industria para el modelado de sistemas y su verificación, emergiendo del ámbito de las ciencias de la computación. Las contribuciones concretas de la tesis han sido: A) La identificación e interpretación de los árboles filogeneticos como modelos de la evolución, adaptados al entorno de las técnicas de model checking. B) La definición de una lógica temporal que captura las propiedades filogenéticas habituales junto con un método de construcción de propiedades. C) La clasificación de propiedades filogenéticas, identificando categorías de propiedades según estén centradas en la estructura del árbol, en las secuencias o sean híbridas. D) La extensión de las lógicas y modelos para contemplar propiedades cuantitativas de tiempo, probabilidad y de distancias. E) El desarrollo de un entorno para la verificación de propiedades booleanas, cuantitativas y paramétricas. F) El establecimiento de los principios para la manipulación simbolica de objetos filogenéticos, p. ej., clados. G) La explotación de las herramientas de model checking existentes, detectando sus problemas y carencias en el campo de filogenia y proponiendo mejoras. H) El desarrollo de técnicas "ad hoc" para obtener ganancia de complejidad alrededor de dos frentes: distribución de los cálculos y datos, y el uso de sistemas de información. Los puntos A-F se centran en las aportaciones conceptuales de nuestra aproximación, mientras que los puntos G-H enfatizan la parte de herramientas e implementación. Los contenidos de la tesis están contrastados por la comunidad científica mediante las siguientes publicaciones en conferencias y revistas internacionales. La introducción de model checking como entorno formal para analizar propiedades biológicas (puntos A-C) ha llevado a la publicación de nuestro primer artículo de congreso [1]. En [2], desarrollamos la verificación de hipótesis filogenéticas sobre un árbol de ejemplo construido a partir de las relaciones impuestas por un conjunto de proteínas codificadas por el ADN mitocondrial humano (ADNmt). En ese ejemplo, usamos una herramienta automática y genérica de model checking (punto G). El artículo de revista [7] resume lo básico de los artículos de congreso previos y extiende la aplicación de lógicas temporales a propiedades filogenéticas no consideradas hasta ahora. Los artículos citados aquí engloban los contenidos presentados en las Parte I--II de la tesis. El enorme tamaño de los árboles y la considerable cantidad de información asociada a los estados (p.ej., la cadena de ADN) obligan a la introducción de adaptaciones especiales en las herramientas de model checking para mantener un rendimiento razonable en la verificación de propiedades y aliviar también el problema de la explosión de estados (puntos G-H). El artículo de congreso [3] presenta las ventajas de rebanar el ADN asociado a los estados, la partición de la filogenia en pequeños subárboles y su distribución entre varias máquinas. Además, la idea original del model checking rebanado se complementa con la inclusión de una base de datos externa para el almacenamiento de secuencias. El artículo de revista [4] reúne las nociones introducidas en [3] junto con la implementación y resultados preliminares presentados [5]. Este tema se corresponde con lo presentado en la Parte III de la tesis. Para terminar, la tesis reaprovecha las extensiones de las lógicas temporales con tiempo explícito y probabilidades a fin de manipular e interrogar al árbol sobre información cuantitativa. El artículo de congreso [6] ejemplifica la necesidad de introducir probabilidades y tiempo discreto para el análisis filogenético de un fenotipo real, en este caso, el ratio de distribución de la intolerancia a la lactosa entre diversas poblaciones arraigadas en las hojas de la filogenia. Esto se corresponde con el Capítulo 13, que queda englobado dentro de las Partes IV--V. Las Partes IV--V completan los conceptos presentados en ese artículo de conferencia hacia otros dominios de aplicación, como la puntuación de árboles, y tiempo continuo (puntos E-F). La introducción de parámetros en las hipótesis filogenéticas se plantea como trabajo futuro. Referencias [1] Roberto Blanco, Gregorio de Miguel Casado, José Ignacio Requeno, and José Manuel Colom. Temporal logics for phylogenetic analysis via model checking. In Proceedings IEEE International Workshop on Mining and Management of Biological and Health Data, pages 152-157. IEEE, 2010. [2] José Ignacio Requeno, Roberto Blanco, Gregorio de Miguel Casado, and José Manuel Colom. Phylogenetic analysis using an SMV tool. In Miguel P. Rocha, Juan M. Corchado Rodríguez, Florentino Fdez-Riverola, and Alfonso Valencia, editors, Proceedings 5th International Conference on Practical Applications of Computational Biology and Bioinformatics, volume 93 of Advances in Intelligent and Soft Computing, pages 167-174. Springer, Berlin, 2011. [3] José Ignacio Requeno, Roberto Blanco, Gregorio de Miguel Casado, and José Manuel Colom. Sliced model checking for phylogenetic analysis. In Miguel P. Rocha, Nicholas Luscombe, Florentino Fdez-Riverola, and Juan M. Corchado Rodríguez, editors, Proocedings 6th International Conference on Practical Applications of Computational Biology and Bioinformatics, volume 154 of Advances in Intelligent and Soft Computing, pages 95-103. Springer, Berlin, 2012. [4] José Ignacio Requeno and José Manuel Colom. Model checking software for phylogenetic trees using distribution and database methods. Journal of Integrative Bioinformatics, 10(3):229-233, 2013. [5] José Ignacio Requeno and José Manuel Colom. Speeding up phylogenetic model checking. In Mohd Saberi Mohamad, Loris Nanni, Miguel P. Rocha, and Florentino Fdez-Riverola, editors, Proceedings 7th International Conference on Practical Applications of Computational Biology and Bioinformatics, volume 222 of Advances in Intelligent Systems and Computing, pages 119-126. Springer, Berlin, 2013. [6] José Ignacio Requeno and José Manuel Colom. Timed and probabilistic model checking over phylogenetic trees. In Miguel P. Rocha et al., editors, Proceedings 8th International Conference on Practical Applications of Computational Biology and Bioinformatics, Advances in Intelligent and Soft Computing. Springer, Berlin, 2014. [7] José Ignacio Requeno, Gregorio de Miguel Casado, Roberto Blanco, and José Manuel Colom. Temporal logics for phylogenetic analysis via model checking. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 10(4):1058-1070, 2013
    corecore