156 research outputs found

    PhyloPattern: regular expressions to identify complex patterns in phylogenetic trees

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>To effectively apply evolutionary concepts in genome-scale studies, large numbers of phylogenetic trees have to be automatically analysed, at a level approaching human expertise. Complex architectures must be recognized within the trees, so that associated information can be extracted.</p> <p>Results</p> <p>Here, we present a new software library, PhyloPattern, for automating tree manipulations and analysis. PhyloPattern includes three main modules, which address essential tasks in high-throughput phylogenetic tree analysis: node annotation, pattern matching, and tree comparison. PhyloPattern thus allows the programmer to focus on: i) the use of predefined or user defined annotation functions to perform immediate or deferred evaluation of node properties, ii) the search for user-defined patterns in large phylogenetic trees, iii) the pairwise comparison of trees by dynamically generating patterns from one tree and applying them to the other.</p> <p>Conclusion</p> <p>PhyloPattern greatly simplifies and accelerates the work of the computer scientist in the evolutionary biology field. The library has been used to automatically identify phylogenetic evidence for domain shuffling or gene loss events in the evolutionary histories of protein sequences. However any workflow that relies on phylogenetic tree analysis, could be automated with PhyloPattern.</p

    PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Phylogenies, i.e., the evolutionary histories of groups of taxa, play a major role in representing the interrelationships among biological entities. Many software tools for reconstructing and evaluating such phylogenies have been proposed, almost all of which assume the underlying evolutionary history to be a tree. While trees give a satisfactory first-order approximation for many families of organisms, other families exhibit evolutionary mechanisms that cannot be represented by trees. Processes such as horizontal gene transfer (HGT), hybrid speciation, and interspecific recombination, collectively referred to as <it>reticulate evolutionary events</it>, result in <it>networks</it>, rather than trees, of relationships. Various software tools have been recently developed to analyze reticulate evolutionary relationships, which include SplitsTree4, LatTrans, EEEP, HorizStory, and T-REX.</p> <p>Results</p> <p>In this paper, we report on the PhyloNet software package, which is a suite of tools for analyzing reticulate evolutionary relationships, or <it>evolutionary networks</it>, which are rooted, directed, acyclic graphs, leaf-labeled by a set of taxa. These tools can be classified into four categories: (1) evolutionary network representation: reading/writing evolutionary networks in a newly devised compact form; (2) evolutionary network characterization: analyzing evolutionary networks in terms of three basic building blocks – trees, clusters, and tripartitions; (3) evolutionary network comparison: comparing two evolutionary networks in terms of topological dissimilarities, as well as fitness to sequence evolution under a maximum parsimony criterion; and (4) evolutionary network reconstruction: reconstructing an evolutionary network from a species tree and a set of gene trees.</p> <p>Conclusion</p> <p>The software package, PhyloNet, offers an array of utilities to allow for efficient and accurate analysis of evolutionary networks. The software package will help significantly in analyzing large data sets, as well as in studying the performance of evolutionary network reconstruction methods. Further, the software package supports the proposed eNewick format for compact representation of evolutionary networks, a feature that allows for efficient interoperability of evolutionary network software tools. Currently, all utilities in PhyloNet are invoked on the command line.</p

    A new, fast algorithm for detecting protein coevolution using maximum compatible cliques

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The MatrixMatchMaker algorithm was recently introduced to detect the similarity between phylogenetic trees and thus the coevolution between proteins. MMM finds the largest common submatrices between pairs of phylogenetic distance matrices, and has numerous advantages over existing methods of coevolution detection. However, these advantages came at the cost of a very long execution time.</p> <p>Results</p> <p>In this paper, we show that the problem of finding the maximum submatrix reduces to a multiple maximum clique subproblem on a graph of protein pairs. This allowed us to develop a new algorithm and program implementation, MMMvII, which achieved more than 600× speedup with comparable accuracy to the original MMM.</p> <p>Conclusions</p> <p>MMMvII will thus allow for more more extensive and intricate analyses of coevolution.</p> <p>Availability</p> <p>An implementation of the MMMvII algorithm is available at: <url>http://www.uhnresearch.ca/labs/tillier/MMMWEBvII/MMMWEBvII.php</url></p

    Phylogenetic Detection of Recombination with a Bayesian Prior on the Distance between Trees

    Get PDF
    Genomic regions participating in recombination events may support distinct topologies, and phylogenetic analyses should incorporate this heterogeneity. Existing phylogenetic methods for recombination detection are challenged by the enormous number of possible topologies, even for a moderate number of taxa. If, however, the detection analysis is conducted independently between each putative recombinant sequence and a set of reference parentals, potential recombinations between the recombinants are neglected. In this context, a recombination hotspot can be inferred in phylogenetic analyses if we observe several consecutive breakpoints. We developed a distance measure between unrooted topologies that closely resembles the number of recombinations. By introducing a prior distribution on these recombination distances, a Bayesian hierarchical model was devised to detect phylogenetic inconsistencies occurring due to recombinations. This model relaxes the assumption of known parental sequences, still common in HIV analysis, allowing the entire dataset to be analyzed at once. On simulated datasets with up to 16 taxa, our method correctly detected recombination breakpoints and the number of recombination events for each breakpoint. The procedure is robust to rate and transition∶transversion heterogeneities for simulations with and without recombination. This recombination distance is related to recombination hotspots. Applying this procedure to a genomic HIV-1 dataset, we found evidence for hotspots and de novo recombination

    Evidence, Content and Corroboration and the Tree of Life

    Get PDF
    We examine three critical aspects of Popper’s formulation of the ‘Logic of Scientific Discovery’—evidence, content and degree of corroboration—and place these concepts in the context of the Tree of Life (ToL) problem with particular reference to molecular systematics. Content, in the sense discussed by Popper, refers to the breadth and scope of existence that a hypothesis purports to explain. Content, in conjunction with the amount of available and relevant evidence, determines the testability, or potential degree of corroboration, of a statement; content distinguishes scientific hypotheses from metaphysical assertions. Degree of corroboration refers to the relative and tentative confidence assigned to one hypothesis over another, based upon the performance of each under critical tests. Here we suggest that systematists attempt to maximize content and evidence to increase the potential degree of corroboration in all phylogenetic endeavors. Discussion of this “total evidence” approach leads to several interesting conclusions about generating ToL hypotheses

    Identifying Currents in the Gene Pool for Bacterial Populations Using an Integrative Approach

    Get PDF
    The evolution of bacterial populations has recently become considerably better understood due to large-scale sequencing of population samples. It has become clear that DNA sequences from a multitude of genes, as well as a broad sample coverage of a target population, are needed to obtain a relatively unbiased view of its genetic structure and the patterns of ancestry connected to the strains. However, the traditional statistical methods for evolutionary inference, such as phylogenetic analysis, are associated with several difficulties under such an extensive sampling scenario, in particular when a considerable amount of recombination is anticipated to have taken place. To meet the needs of large-scale analyses of population structure for bacteria, we introduce here several statistical tools for the detection and representation of recombination between populations. Also, we introduce a model-based description of the shape of a population in sequence space, in terms of its molecular variability and affinity towards other populations. Extensive real data from the genus Neisseria are utilized to demonstrate the potential of an approach where these population genetic tools are combined with an phylogenetic analysis. The statistical tools introduced here are freely available in BAPS 5.2 software, which can be downloaded from http://web.abo.fi/fak/mnf/mate/jc/software/baps.html

    Phylogenomic Analysis of Marine Roseobacters

    Get PDF
    Background: Members of the Roseobacter clade which play a key role in the biogeochemical cycles of the ocean are diverse and abundant, comprising 10–25 % of the bacterioplankton in most marine surface waters. The rapid accumulation of whole-genome sequence data for the Roseobacter clade allows us to obtain a clearer picture of its evolution. Methodology/Principal Findings: In this study about 1,200 likely orthologous protein families were identified from 17 Roseobacter bacteria genomes. Functional annotations for these genes are provided by iProClass. Phylogenetic trees were constructed for each gene using maximum likelihood (ML) and neighbor joining (NJ). Putative organismal phylogenetic trees were built with phylogenomic methods. These trees were compared and analyzed using principal coordinates analysis (PCoA), approximately unbiased (AU) and Shimodaira–Hasegawa (SH) tests. A core set of 694 genes with vertical descent signal that are resistant to horizontal gene transfer (HGT) is used to reconstruct a robust organismal phylogeny. In addition, we also discovered the most likely 109 HGT genes. The core set contains genes that encode ribosomal apparatus, ABC transporters and chaperones often found in the environmental metagenomic and metatranscriptomic data. These genes in the core set are spread out uniformly among the various functional classes and biological processes. Conclusions/Significance: Here we report a new multigene-derived phylogenetic tree of the Roseobacter clade. Of particular interest is the HGT of eleven genes involved in vitamin B12 synthesis as well as key enzynmes fo

    Bioinformatics and Structural Characterization of a Hypothetical Protein from Streptococcus mutans: Implication of Antibiotic Resistance

    Get PDF
    As an oral bacterial pathogen, Streptococcus mutans has been known as the aetiologic agent of human dental caries. Among a total of 1960 identified proteins within the genome of this organism, there are about 500 without any known functions. One of these proteins, SMU.440, has very few homologs in the current protein databases and it does not fall into any protein functional families. Phylogenetic studies showed that SMU.440 is related to a particular ecological niche and conserved specifically in some oral pathogens, due to lateral gene transfer. The co-occurrence of a MarR protein within the same operon among these oral pathogens suggests that SMU.440 may be associated with antibiotic resistance. The structure determination of SMU.440 revealed that it shares the same fold and a similar pocket as polyketide cyclases, which indicated that it is very likely to bind some polyketide-like molecules. From the interlinking structural and bioinformatics studies, we have concluded that SMU.440 could be involved in polyketide-like antibiotic resistance, providing a better understanding of this hypothetical protein. Besides, the combination of multiple methods in this study can be used as a general approach for functional studies of a protein with unknown function
    corecore