252 research outputs found

    Prokaryote genome fluidity is dependent on effective population size

    Many prokaryote species are known to have fluid genomes, with different strains varying markedly in accessory gene content through the combined action of gene loss, gene gain via lateral transfer, as well as gene duplication. However, the evolutionary forces determining genome fluidity are not yet well understood. We here for the first time systematically analyse the degree to which this distinctive genomic feature differs between bacterial species. We find that genome fluidity is positively correlated with synonymous nucleotide diversity of the core genome, a measure of effective population size Ne. No effects of genome size, phylogeny or homologous recombination rate on genome fluidity were found. Our findings are consistent with a scenario where accessory gene content turnover is for a large part dictated by neutral evolution

    BranchClust: a phylogenetic algorithm for selecting gene families

    BACKGROUND: Automated methods for assembling families of orthologous genes include those based on sequence similarity scores and those based on phylogenetic approaches. The first are easy to automate but usually they do not distinguish between paralogs and orthologs or have restriction on the number of taxa. Phylogenetic methods often are based on reconciliation of a gene tree with a known rooted species tree; a limitation of this approach, especially in case of prokaryotes, is that the species tree is often unknown, and that from the analyses of single gene families the branching order between related organisms frequently is unresolved. RESULTS: Here we describe an algorithm for the automated selection of orthologous genes that recognizes orthologous genes from different species in a phylogenetic tree for any number of taxa. The algorithm is capable of distinguishing complete (containing all taxa) and incomplete (not containing all taxa) families and recognizes in- and outparalogs. The BranchClust algorithm is implemented in Perl with the use of the BioPerl module for parsing trees and is freely available at . CONCLUSION: BranchClust outperforms the Reciprocal Best Blast hit method in selecting more sets of putatively orthologous genes. In the test cases examined, the correctness of the selected families and of the identified in- and outparalogs was confirmed by inspection of the pertinent phylogenetic trees

    Reassessment of the Lineage Fusion Hypothesis for the Origin of Double Membrane Bacteria

    In 2009, James Lake introduced a new hypothesis in which reticulate phylogeny reconstruction is used to elucidate the origin of Gram-negative bacteria (Nature 460: 967–971). The presented data supported the Gram-negative bacteria originating from an ancient endosymbiosis between the Actinobacteria and Clostridia. His conclusion was based on a presence-absence analysis of protein families that divided all prokaryotes into five groups: Actinobacteria, Double Membrane bacteria (DM), Clostridia, Archaea and Bacilli. Of these five groups, the DM are by far the largest and most diverse group compared to the other groupings. While the fusion hypothesis for the origin of double membrane bacteria is enticing, we show that the signal supporting an ancient symbiosis is lost when the DM group is broken down into smaller subgroups. We conclude that the signal detected in James Lake's analysis in part results from a systematic artifact due to group size and diversity combined with low levels of horizontal gene transfer.Exobiology Program (U.S.) (Grant NNX08AQ10G)Assembling the Tree of Life (Program) (Grant DEB 0830024

    Evidence for acquisition of virulence effectors in pathogenic chytrids

    Background The decline in amphibian populations across the world is frequently linked to the infection of the chytrid fungus Batrachochytrium dendrobatidis (Bd). This is particularly perplexing because Bd was only recently discovered in 1999 and no chytrid fungus had previously been identified as a vertebrate pathogen. Results In this study, we show that two large families of known virulence effector genes, crinkler (CRN) proteins and serine peptidases, were acquired by Bd from oomycete pathogens and bacteria, respectively. These two families have been duplicated after their acquisition by Bd. Additional selection analyses indicate that both families evolved under strong positive selection, suggesting that they are involved in the adaptation of Bd to its hosts. Conclusions We propose that the acquisition of virulence effectors, in combination with habitat disruption and climate change, may have driven the Bd epidemics and the decline in amphibian populations. This finding provides a starting point for biochemical investigations of chytridiomycosis

    Molecular Evolution of Aminoacyl tRNA Synthetase Proteins in the Early History of Life

    Aminoacyl-tRNA synthetases (aaRS) consist of several families of functionally conserved proteins essential for translation and protein synthesis. Like nearly all components of the translation machinery, most aaRS families are universally distributed across cellular life, being inherited from the time of the Last Universal Common Ancestor (LUCA). However, unlike the rest of the translation machinery, aaRS have undergone numerous ancient horizontal gene transfers, with several independent events detected between domains, and some possibly involving lineages diverging before the time of LUCA. These transfers reveal the complexity of molecular evolution at this early time, and the chimeric nature of genomes within cells that gave rise to the major domains. Additionally, given the role of these protein families in defining the amino acids used for protein synthesis, sequence reconstruction of their pre-LUCA ancestors can reveal the evolutionary processes at work in the origin of the genetic code. In particular, sequence reconstructions of the paralog ancestors of isoleucyl- and valyl- RS provide strong empirical evidence that at least for this divergence, the genetic code did not co-evolve with the aaRSs; rather, both amino acids were already part of the genetic code before their cognate aaRSs diverged from their common ancestor. The implications of this observation for the early evolution of RNA-directed protein biosynthesis are discussed.National Science Foundation (U.S.) (Grant DEB 0830024)National Science Foundation (U.S.) (Grant DEB 0936234)United States. National Aeronautics and Space Administration (NASA Postdoctoral Fellowship

    Bootstrap, Bayesian probability and maximum likelihood mapping: exploring new tools for comparative genome analyses

    BACKGROUND: Horizontal gene transfer (HGT) played an important role in shaping microbial genomes. In addition to genes under sporadic selection, HGT also affects housekeeping genes and those involved in information processing, even ribosomal RNA encoding genes. Here we describe tools that provide an assessment and graphic illustration of the mosaic nature of microbial genomes. RESULTS: We adapted the Maximum Likelihood (ML) mapping to the analyses of all detected quartets of orthologous genes found in four genomes. We have automated the assembly and analyses of these quartets of orthologs given the selection of four genomes. We compared the ML-mapping approach to more rigorous Bayesian probability and Bootstrap mapping techniques. The latter two approaches appear to be more conservative than the ML-mapping approach, but qualitatively all three approaches give equivalent results. All three tools were tested on mitochondrial genomes, which presumably were inherited as a single linkage group. CONCLUSIONS: In some instances of interphylum relationships we find nearly equal numbers of quartets strongly supporting the three possible topologies. In contrast, our analyses of genome quartets containing the cyanobacterium Synechocystis sp. indicate that a large part of the cyanobacterial genome is related to that of low GC Gram positives. Other groups that had been suggested as sister groups to the cyanobacteria contain many fewer genes that group with the Synechocystis orthologs. Interdomain comparisons of genome quartets containing the archaeon Halobacterium sp. revealed that Halobacterium sp. shares more genes with Bacteria that live in the same environment than with Bacteria that are more closely related based on rRNA phylogeny . Many of these genes encode proteins involved in substrate transport and metabolism and in information storage and processing. The performed analyses demonstrate that relationships among prokaryotes cannot be accurately depicted by or inferred from the tree-like evolution of a core of rarely transferred genes; rather prokaryotic genomes are mosaics in which different parts have different evolutionary histories. Probability mapping is a valuable tool to explore the mosaic nature of genomes

    In silico prioritisation of candidate genes for prokaryotic gene function discovery: an application of phylogenetic profiles

    Background: In silico candidate gene prioritisation (CGP) aids the discovery of gene functions by ranking genes according to an objective relevance score. While several CGP methods have been described for identifying human disease genes, corresponding methods for prokaryotic gene function discovery are lacking. Here we present two prokaryotic CGP methods, based on phylogenetic profiles, to assist with this task. Results: Using gene occurrence patterns in sample genomes, we developed two CGP methods (statistical and inductive CGP) to assist with the discovery of bacterial gene functions. Statistical CGP exploits the differences in gene frequency against phenotypic groups, while inductive CGP applies supervised machine learning to identify gene occurrence pattern across genomes. Three rediscovery experiments were designed to evaluate the CGP frameworks. The first experiment attempted to rediscover peptidoglycan genes with 417 published genome sequences. Both CGP methods achieved best areas under receiver operating characteristic curve (AUC) of 0.911 in Escherichia coli K-12 (EC-K12) and 0.978 Streptococcus agalactiae 2603 (SA-2603) genomes, with an average improvement in precision of >3.2-fold and a maximum of >27-fold using statistical CGP. A median AUC of >0.95 could still be achieved with as few as 10 genome examples in each group of genome examples in the rediscovery of the peptidoglycan metabolism genes. In the second experiment, a maximum of 109-fold improvement in precision was achieved in the rediscovery of anaerobic fermentation genes in EC-K12. The last experiment attempted to rediscover genes from 31 metabolic pathways in SA-2603, where 14 pathways achieved AUC >0.9 and 28 pathways achieved AUC >0.8 with the best inductive CGP algorithms. Conclusion: Our results demonstrate that the two CGP methods can assist with the study of functionally uncategorised genomic regions and discovery of bacterial gene-function relationships. Our rediscovery experiments also provide a set of standard tasks against which future methods may be compared.12 page(s

    OrgConv: detection of gene conversion using consensus sequences and its application in plant mitochondrial and chloroplast homologs

    <p>Abstract</p> <p>Background</p> <p>The ancestry of mitochondria and chloroplasts traces back to separate endosymbioses of once free-living bacteria. The highly reduced genomes of these two organelles therefore contain very distant homologs that only recently have been shown to recombine inside the mitochondrial genome. Detection of gene conversion between mitochondrial and chloroplast homologs was previously impossible due to the lack of suitable computer programs. Recently, I developed a novel method and have, for the first time, discovered recurrent gene conversion between chloroplast mitochondrial genes. The method will further our understanding of plant organellar genome evolution and help identify and remove gene regions with incongruent phylogenetic signals for several genes widely used in plant systematics. Here, I implement such a method that is available in a user friendly web interface.</p> <p>Results</p> <p><monospace>OrgConv</monospace> (<b>Org</b>anellar <b>Conv</b>ersion) is a computer package developed for detection of gene conversion between mitochondrial and chloroplast homologous genes. <monospace>OrgConv</monospace> is available in two forms; source code can be installed and run on a Linux platform and a web interface is available on multiple operating systems. The input files of the feature program are two multiple sequence alignments from different organellar compartments in FASTA format. The program compares every examined sequence against the consensus sequence of each sequence alignment rather than exhaustively examining every possible combination. Making use of consensus sequences significantly reduces the number of comparisons and therefore reduces overall computational time, which allows for analysis of very large datasets. Most importantly, with the significantly reduced number of comparisons, the statistical power remains high in the face of correction for multiple tests.</p> <p>Conclusions</p> <p>Both the source code and the web interface of <monospace>OrgConv</monospace> are available for free from the <monospace>OrgConv</monospace> website <url>http://www.indiana.edu/~orgconv</url>. Although <monospace>OrgConv</monospace> has been developed with main focus on detection of gene conversion between mitochondrial and chloroplast genes, it may also be used for detection of gene conversion between any two distinct groups of homologous sequences.</p

    Conservation of intron and intein insertion sites: implications for life histories of parasitic genetic elements

    <p>Abstract</p> <p>Background</p> <p>Inteins and introns are genetic elements that are removed from proteins and RNA after translation or transcription, respectively. Previous studies have suggested that these genetic elements are found in conserved parts of the host protein. To our knowledge this type of analysis has not been done for group II introns residing within a gene. Here we provide quantitative statistical support from an analyses of proteins that host inteins, group I introns, group II introns and spliceosomal introns across all three domains of life.</p> <p>Results</p> <p>To determine whether or not inteins, group I, group II, and spliceosomal introns are found preferentially in conserved regions of their respective host protein, conservation profiles were generated and intein and intron positions were mapped to the profiles. Fisher's combined probability test was used to determine the significance of the distribution of insertion sites across the conservation profile for each protein. For a subset of studied proteins, the conservation profile and insertion positions were mapped to protein structures to determine if the insertion sites correlate to regions of functional activity. All inteins and most group I introns were found to be preferentially located within conserved regions; in contrast, a bacterial intein-like protein, group II and spliceosomal introns did not show a preference for conserved sites.</p> <p>Conclusions</p> <p>These findings demonstrate that inteins and group I introns are found preferentially in conserved regions of their respective host proteins. Homing endonucleases are often located within inteins and group I introns and these may facilitate mobility to conserved regions. Insertion at these conserved positions decreases the chance of elimination, and slows deletion of the elements, since removal of the elements has to be precise as not to disrupt the function of the protein. Furthermore, functional constrains on the targeted site make it more difficult for hosts to evolve immunity to the homing endonuclease. Therefore, these elements will better survive and propagate as molecular parasites in conserved sites. In contrast, spliceosomal introns and group II introns do not show significant preference for conserved sites and appear to have adopted a different strategy to evade loss.</p

    Phylogenomic Analysis of Marine Roseobacters

    Background: Members of the Roseobacter clade which play a key role in the biogeochemical cycles of the ocean are diverse and abundant, comprising 10–25 % of the bacterioplankton in most marine surface waters. The rapid accumulation of whole-genome sequence data for the Roseobacter clade allows us to obtain a clearer picture of its evolution. Methodology/Principal Findings: In this study about 1,200 likely orthologous protein families were identified from 17 Roseobacter bacteria genomes. Functional annotations for these genes are provided by iProClass. Phylogenetic trees were constructed for each gene using maximum likelihood (ML) and neighbor joining (NJ). Putative organismal phylogenetic trees were built with phylogenomic methods. These trees were compared and analyzed using principal coordinates analysis (PCoA), approximately unbiased (AU) and Shimodaira–Hasegawa (SH) tests. A core set of 694 genes with vertical descent signal that are resistant to horizontal gene transfer (HGT) is used to reconstruct a robust organismal phylogeny. In addition, we also discovered the most likely 109 HGT genes. The core set contains genes that encode ribosomal apparatus, ABC transporters and chaperones often found in the environmental metagenomic and metatranscriptomic data. These genes in the core set are spread out uniformly among the various functional classes and biological processes. Conclusions/Significance: Here we report a new multigene-derived phylogenetic tree of the Roseobacter clade. Of particular interest is the HGT of eleven genes involved in vitamin B12 synthesis as well as key enzynmes fo