138 research outputs found

    PhyloPattern: regular expressions to identify complex patterns in phylogenetic trees

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>To effectively apply evolutionary concepts in genome-scale studies, large numbers of phylogenetic trees have to be automatically analysed, at a level approaching human expertise. Complex architectures must be recognized within the trees, so that associated information can be extracted.</p> <p>Results</p> <p>Here, we present a new software library, PhyloPattern, for automating tree manipulations and analysis. PhyloPattern includes three main modules, which address essential tasks in high-throughput phylogenetic tree analysis: node annotation, pattern matching, and tree comparison. PhyloPattern thus allows the programmer to focus on: i) the use of predefined or user defined annotation functions to perform immediate or deferred evaluation of node properties, ii) the search for user-defined patterns in large phylogenetic trees, iii) the pairwise comparison of trees by dynamically generating patterns from one tree and applying them to the other.</p> <p>Conclusion</p> <p>PhyloPattern greatly simplifies and accelerates the work of the computer scientist in the evolutionary biology field. The library has been used to automatically identify phylogenetic evidence for domain shuffling or gene loss events in the evolutionary histories of protein sequences. However any workflow that relies on phylogenetic tree analysis, could be automated with PhyloPattern.</p

    Evidence, Content and Corroboration and the Tree of Life

    Get PDF
    We examine three critical aspects of Popper’s formulation of the ‘Logic of Scientific Discovery’—evidence, content and degree of corroboration—and place these concepts in the context of the Tree of Life (ToL) problem with particular reference to molecular systematics. Content, in the sense discussed by Popper, refers to the breadth and scope of existence that a hypothesis purports to explain. Content, in conjunction with the amount of available and relevant evidence, determines the testability, or potential degree of corroboration, of a statement; content distinguishes scientific hypotheses from metaphysical assertions. Degree of corroboration refers to the relative and tentative confidence assigned to one hypothesis over another, based upon the performance of each under critical tests. Here we suggest that systematists attempt to maximize content and evidence to increase the potential degree of corroboration in all phylogenetic endeavors. Discussion of this “total evidence” approach leads to several interesting conclusions about generating ToL hypotheses

    A Universal Next-Generation Sequencing Protocol To Generate Noninfectious Barcoded cDNA Libraries from High-Containment RNA Viruses

    Get PDF
    ABSTRACT Several biosafety level 3 and/or 4 (BSL-3/4) pathogens are high-consequence, single-stranded RNA viruses, and their genomes, when introduced into permissive cells, are infectious. Moreover, many of these viruses are select agents (SAs), and their genomes are also considered SAs. For this reason, cDNAs and/or their derivatives must be tested to ensure the absence of infectious virus and/or viral RNA before transfer out of the BSL-3/4 and/or SA laboratory. This tremendously limits the capacity to conduct viral genomic research, particularly the application of next-generation sequencing (NGS). Here, we present a sequence-independent method to rapidly amplify viral genomic RNA while simultaneously abolishing both viral and genomic RNA infectivity across multiple single-stranded positive-sense RNA (ssRNA+) virus families. The process generates barcoded DNA amplicons that range in length from 300 to 1,000 bp, which cannot be used to rescue a virus and are stable to transport at room temperature. Our barcoding approach allows for up to 288 barcoded samples to be pooled into a single library and run across various NGS platforms without potential reconstitution of the viral genome. Our data demonstrate that this approach provides full-length genomic sequence information not only from high-titer virion preparations but it can also recover specific viral sequence from samples with limited starting material in the background of cellular RNA, and it can be used to identify pathogens from unknown samples. In summary, we describe a rapid, universal standard operating procedure that generates high-quality NGS libraries free of infectious virus and infectious viral RNA. IMPORTANCE This report establishes and validates a standard operating procedure (SOP) for select agents (SAs) and other biosafety level 3 and/or 4 (BSL-3/4) RNA viruses to rapidly generate noninfectious, barcoded cDNA amenable for next-generation sequencing (NGS). This eliminates the burden of testing all processed samples derived from high-consequence pathogens prior to transfer from high-containment laboratories to lower-containment facilities for sequencing. Our established protocol can be scaled up for high-throughput sequencing of hundreds of samples simultaneously, which can dramatically reduce the cost and effort required for NGS library construction. NGS data from this SOP can provide complete genome coverage from viral stocks and can also detect virus-specific reads from limited starting material. Our data suggest that the procedure can be implemented and easily validated by institutional biosafety committees across research laboratories

    A new, fast algorithm for detecting protein coevolution using maximum compatible cliques

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The MatrixMatchMaker algorithm was recently introduced to detect the similarity between phylogenetic trees and thus the coevolution between proteins. MMM finds the largest common submatrices between pairs of phylogenetic distance matrices, and has numerous advantages over existing methods of coevolution detection. However, these advantages came at the cost of a very long execution time.</p> <p>Results</p> <p>In this paper, we show that the problem of finding the maximum submatrix reduces to a multiple maximum clique subproblem on a graph of protein pairs. This allowed us to develop a new algorithm and program implementation, MMMvII, which achieved more than 600× speedup with comparable accuracy to the original MMM.</p> <p>Conclusions</p> <p>MMMvII will thus allow for more more extensive and intricate analyses of coevolution.</p> <p>Availability</p> <p>An implementation of the MMMvII algorithm is available at: <url>http://www.uhnresearch.ca/labs/tillier/MMMWEBvII/MMMWEBvII.php</url></p

    Rapid Pathway Evolution Facilitated by Horizontal Gene Transfers across Prokaryotic Lineages

    Get PDF
    The evolutionary history of biological pathways is of general interest, especially in this post-genomic era, because it may provide clues for understanding how complex systems encoded on genomes have been organized. To explain how pathways can evolve de novo, some noteworthy models have been proposed. However, direct reconstruction of pathway evolutionary history both on a genomic scale and at the depth of the tree of life has suffered from artificial effects in estimating the gene content of ancestral species. Recently, we developed an algorithm that effectively reconstructs gene-content evolution without these artificial effects, and we applied it to this problem. The carefully reconstructed history, which was based on the metabolic pathways of 160 prokaryotic species, confirmed that pathways have grown beyond the random acquisition of individual genes. Pathway acquisition took place quickly, probably eliminating the difficulty in holding genes during the course of the pathway evolution. This rapid evolution was due to massive horizontal gene transfers as gene groups, some of which were possibly operon transfers, which would convey existing pathways but not be able to generate novel pathways. To this end, we analyzed how these pathways originally appeared and found that the original acquisition of pathways occurred more contemporaneously than expected across different phylogenetic clades. As a possible model to explain this observation, we propose that novel pathway evolution may be facilitated by bidirectional horizontal gene transfers in prokaryotic communities. Such a model would complement existing pathway evolution models

    Phylogenomic Analysis of Marine Roseobacters

    Get PDF
    Background: Members of the Roseobacter clade which play a key role in the biogeochemical cycles of the ocean are diverse and abundant, comprising 10–25 % of the bacterioplankton in most marine surface waters. The rapid accumulation of whole-genome sequence data for the Roseobacter clade allows us to obtain a clearer picture of its evolution. Methodology/Principal Findings: In this study about 1,200 likely orthologous protein families were identified from 17 Roseobacter bacteria genomes. Functional annotations for these genes are provided by iProClass. Phylogenetic trees were constructed for each gene using maximum likelihood (ML) and neighbor joining (NJ). Putative organismal phylogenetic trees were built with phylogenomic methods. These trees were compared and analyzed using principal coordinates analysis (PCoA), approximately unbiased (AU) and Shimodaira–Hasegawa (SH) tests. A core set of 694 genes with vertical descent signal that are resistant to horizontal gene transfer (HGT) is used to reconstruct a robust organismal phylogeny. In addition, we also discovered the most likely 109 HGT genes. The core set contains genes that encode ribosomal apparatus, ABC transporters and chaperones often found in the environmental metagenomic and metatranscriptomic data. These genes in the core set are spread out uniformly among the various functional classes and biological processes. Conclusions/Significance: Here we report a new multigene-derived phylogenetic tree of the Roseobacter clade. Of particular interest is the HGT of eleven genes involved in vitamin B12 synthesis as well as key enzynmes fo

    Obscured phylogeny and possible recombinational dormancy in Escherichia coli

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Escherichia coli </it>is one of the best studied organisms in all of biology, but its phylogenetic structure has been difficult to resolve with current data and analytical techniques. We analyzed single nucleotide polymorphisms in chromosomes of representative strains to reconstruct the topology of its emergence.</p> <p>Results</p> <p>The phylogeny of <it>E. coli </it>varies according to the segment of chromosome analyzed. Recombination between extant <it>E. coli </it>groups is largely limited to only three intergroup pairings.</p> <p>Conclusions</p> <p>Segment-dependent phylogenies most likely are legacies of a complex recombination history. However, <it>E. coli </it>are now in an epoch in which they no longer broadly share DNA. Using the definition of species as organisms that freely exchange genetic material, this recombinational dormancy could reflect either the end of <it>E. coli </it>as a species, or herald the coalescence of <it>E. coli </it>groups into new species.</p

    GRISOTTO: A greedy approach to improve combinatorial algorithms for motif discovery with prior knowledge

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Position-specific priors (PSP) have been used with success to boost EM and Gibbs sampler-based motif discovery algorithms. PSP information has been computed from different sources, including orthologous conservation, DNA duplex stability, and nucleosome positioning. The use of prior information has not yet been used in the context of combinatorial algorithms. Moreover, priors have been used only independently, and the gain of combining priors from different sources has not yet been studied.</p> <p>Results</p> <p>We extend RISOTTO, a combinatorial algorithm for motif discovery, by post-processing its output with a greedy procedure that uses prior information. PSP's from different sources are combined into a scoring criterion that guides the greedy search procedure. The resulting method, called GRISOTTO, was evaluated over 156 yeast TF ChIP-chip sequence-sets commonly used to benchmark prior-based motif discovery algorithms. Results show that GRISOTTO is at least as accurate as other twelve state-of-the-art approaches for the same task, even without combining priors. Furthermore, by considering combined priors, GRISOTTO is considerably more accurate than the state-of-the-art approaches for the same task. We also show that PSP's improve GRISOTTO ability to retrieve motifs from mouse ChiP-seq data, indicating that the proposed algorithm can be applied to data from a different technology and for a higher eukaryote.</p> <p>Conclusions</p> <p>The conclusions of this work are twofold. First, post-processing the output of combinatorial algorithms by incorporating prior information leads to a very efficient and effective motif discovery method. Second, combining priors from different sources is even more beneficial than considering them separately.</p

    Comparative genomics of the class 4 histone deacetylase family indicates a complex evolutionary history

    Get PDF
    BACKGROUND: Histone deacetylases are enzymes that modify core histones and play key roles in transcriptional regulation, chromatin assembly, DNA repair, and recombination in eukaryotes. Three types of related histone deacetylases (classes 1, 2, and 4) are widely found in eukaryotes, and structurally related proteins have also been found in some prokaryotes. Here we focus on the evolutionary history of the class 4 histone deacetylase family. RESULTS: Through sequence similarity searches against sequenced genomes and expressed sequence tag data, we identified members of the class 4 histone deacetylase family in 45 eukaryotic and 37 eubacterial species representative of very distant evolutionary lineages. Multiple phylogenetic analyses indicate that the phylogeny of these proteins is, in many respects, at odds with the phylogeny of the species in which they are found. In addition, the eukaryotic members of the class 4 histone deacetylase family clearly display an anomalous phyletic distribution. CONCLUSION: The unexpected phylogenetic relationships within the class 4 histone deacetylase family and the anomalous phyletic distribution of these proteins within eukaryotes might be explained by two mechanisms: ancient gene duplication followed by differential gene losses and/or horizontal gene transfer. We discuss both possibilities in this report, and suggest that the evolutionary history of the class 4 histone deacetylase family may have been shaped by horizontal gene transfers
    corecore