91 research outputs found

    A probabilistic model for gene content evolution with duplication, loss, and horizontal transfer

    Full text link
    We introduce a Markov model for the evolution of a gene family along a phylogeny. The model includes parameters for the rates of horizontal gene transfer, gene duplication, and gene loss, in addition to branch lengths in the phylogeny. The likelihood for the changes in the size of a gene family across different organisms can be calculated in O(N+hM^2) time and O(N+M^2) space, where N is the number of organisms, hh is the height of the phylogeny, and M is the sum of family sizes. We apply the model to the evolution of gene content in Preoteobacteria using the gene families in the COG (Clusters of Orthologous Groups) database

    A deeply branching thermophilic bacterium with an ancient acetyl-CoA pathway dominates a subsurface ecosystem

    Get PDF
    <div><p>A nearly complete genome sequence of <em>Candidatus</em> ‘Acetothermum autotrophicum’, a presently uncultivated bacterium in candidate division OP1, was revealed by metagenomic analysis of a subsurface thermophilic microbial mat community. Phylogenetic analysis based on the concatenated sequences of proteins common among 367 prokaryotes suggests that <em>Ca.</em> ‘A. autotrophicum’ is one of the earliest diverging bacterial lineages. It possesses a folate-dependent Wood-Ljungdahl (acetyl-CoA) pathway of CO<sub>2</sub> fixation, is predicted to have an acetogenic lifestyle, and possesses the newly discovered archaeal-autotrophic type of bifunctional fructose 1,6-bisphosphate aldolase/phosphatase. A phylogenetic analysis of the core gene cluster of the acethyl-CoA pathway, shared by acetogens, methanogens, some sulfur- and iron-reducers and dechlorinators, supports the hypothesis that the core gene cluster of <em>Ca.</em> ‘A. autotrophicum’ is a particularly ancient bacterial pathway. The habitat, physiology and phylogenetic position of <em>Ca.</em> ‘A. autotrophicum’ support the view that the first bacterial and archaeal lineages were H<sub>2</sub>-dependent acetogens and methanogenes living in hydrothermal environments.</p> </div

    Reassessment of the Lineage Fusion Hypothesis for the Origin of Double Membrane Bacteria

    Get PDF
    In 2009, James Lake introduced a new hypothesis in which reticulate phylogeny reconstruction is used to elucidate the origin of Gram-negative bacteria (Nature 460: 967–971). The presented data supported the Gram-negative bacteria originating from an ancient endosymbiosis between the Actinobacteria and Clostridia. His conclusion was based on a presence-absence analysis of protein families that divided all prokaryotes into five groups: Actinobacteria, Double Membrane bacteria (DM), Clostridia, Archaea and Bacilli. Of these five groups, the DM are by far the largest and most diverse group compared to the other groupings. While the fusion hypothesis for the origin of double membrane bacteria is enticing, we show that the signal supporting an ancient symbiosis is lost when the DM group is broken down into smaller subgroups. We conclude that the signal detected in James Lake's analysis in part results from a systematic artifact due to group size and diversity combined with low levels of horizontal gene transfer.Exobiology Program (U.S.) (Grant NNX08AQ10G)Assembling the Tree of Life (Program) (Grant DEB 0830024

    Fast and Robust Characterization of Time-Heterogeneous Sequence Evolutionary Processes Using Substitution Mapping

    Get PDF
    Genes and genomes do not evolve similarly in all branches of the tree of life. Detecting and characterizing the heterogeneity in time, and between lineages, of the nucleotide (or amino acid) substitution process is an important goal of current molecular evolutionary research. This task is typically achieved through the use of non-homogeneous models of sequence evolution, which being highly parametrized and computationally-demanding are not appropriate for large-scale analyses. Here we investigate an alternative methodological option based on probabilistic substitution mapping. The idea is to first reconstruct the substitutional history of each site of an alignment under a homogeneous model of sequence evolution, then to characterize variations in the substitution process across lineages based on substitution counts. Using simulated and published datasets, we demonstrate that probabilistic substitution mapping is robust in that it typically provides accurate reconstruction of sequence ancestry even when the true process is heterogeneous, but a homogeneous model is adopted. Consequently, we show that the new approach is essentially as efficient as and extremely faster than (up to 25 000 times) existing methods, thus paving the way for a systematic survey of substitution process heterogeneity across genes and lineages

    Bio++: Efficient Extensible Libraries and Tools for Computational Molecular Evolution

    Get PDF
    Efficient algorithms and programs for the analysis of the ever-growing amount of biological sequence data are strongly needed in the genomics era. The pace at which new data and methodologies are generated calls for the use of pre-existing, optimized—yet extensible—code, typically distributed as libraries or packages. This motivated the Bio++ project, aiming at developing a set of C++ libraries for sequence analysis, phylogenetics, population genetics, and molecular evolution. The main attractiveness of Bio++ is the extensibility and reusability of its components through its object-oriented design, without compromising the computer-efficiency of the underlying methods. We present here the second major release of the libraries, which provides an extended set of classes and methods. These extensions notably provide built-in access to sequence databases and new data structures for handling and manipulating sequences from the omics era, such as multiple genome alignments and sequencing reads libraries. More complex models of sequence evolution, such as mixture models and generic n-tuples alphabets, are also included

    Unresolved orthology and peculiar coding sequence properties of lamprey genes: the KCNA gene family as test case

    Get PDF
    Background:In understanding the evolutionary process of vertebrates, cyclostomes (hagfishes and lamprey) occupy crucial positions. Resolving molecular phylogenetic relationships of cyclostome genes with gnathostomes (jawed vertebrates) genes is indispensable in deciphering both the species tree and gene trees. However, molecular phylogenetic analyses, especially those including lamprey genes, have produced highly discordant results between gene families. To efficiently scrutinize this problem using partial genome assemblies of early vertebrates, we focused on the potassium voltage-gated channel, shaker-related (KCNA) family, whose members are mostly single-exon.Results:Seven sea lamprey KCNA genes as well as six elephant shark genes were identified, and their orthologies to bony vertebrate subgroups were assessed. In contrast to robustly supported orthology of the elephant shark genes to gnathostome subgroups, clear orthology of any sea lamprey gene could not be established. Notably, sea lamprey KCNA sequences displayed unique codon usage pattern and amino acid composition, probably associated with exceptionally high GC-content in their coding regions. This lamprey-specific property of coding sequences was also observed generally for genes outside this gene family.Conclusions:Our results suggest that secondary modifications of sequence properties unique to the lamprey lineage may be one of the factors preventing robust orthology assessments of lamprey genes, which deserves further genome-wide validation. The lamprey lineage-specific alteration of protein-coding sequence properties needs to be taken into consideration in tackling the key questions about early vertebrate evolution

    RecPhyloXML: a format for reconciled gene trees.

    Get PDF
    A reconciliation is an annotation of the nodes of a gene tree with evolutionary events-for example, speciation, gene duplication, transfer, loss, etc.-along with a mapping onto a species tree. Many algorithms and software produce or use reconciliations but often using different reconciliation formats, regarding the type of events considered or whether the species tree is dated or not. This complicates the comparison and communication between different programs. Here, we gather a consortium of software developers in gene tree species tree reconciliation to propose and endorse a format that aims to promote an integrative-albeit flexible-specification of phylogenetic reconciliations. This format, named recPhyloXML, is accompanied by several tools such as a reconciled tree visualizer and conversion utilities. http://phylariane.univ-lyon1.fr/recphyloxml/

    The Twin-Arginine Translocation Pathway in α-Proteobacteria Is Functionally Preserved Irrespective of Genomic and Regulatory Divergence

    Get PDF
    The twin-arginine translocation (Tat) pathway exports fully folded proteins out of the cytoplasm of Gram-negative and Gram-positive bacteria. Although much progress has been made in unraveling the molecular mechanism and biochemical characterization of the Tat system, little is known concerning its functionality and biological role to confer adaptive skills, symbiosis or pathogenesis in the α-proteobacteria class. A comparative genomic analysis in the α-proteobacteria class confirmed the presence of tatA, tatB, and tatC genes in almost all genomes, but significant variations in gene synteny and rearrangements were found in the order Rickettsiales with respect to the typically described operon organization. Transcription of tat genes was confirmed for Anaplasma marginale str. St. Maries and Brucella abortus 2308, two α-proteobacteria with full and partial intracellular lifestyles, respectively. The tat genes of A. marginale are scattered throughout the genome, in contrast to the more generalized operon organization. Particularly, tatA showed an approximately 20-fold increase in mRNA levels relative to tatB and tatC. We showed Tat functionality in B. abortus 2308 for the first time, and confirmed conservation of functionality in A. marginale. We present the first experimental description of the Tat system in the Anaplasmataceae and Brucellaceae families. In particular, in A. marginale Tat functionality is conserved despite operon splitting as a consequence of genome rearrangements. Further studies will be required to understand how the proper stoichiometry of the Tat protein complex and its biological role are achieved. In addition, the predicted substrates might be the evidence of role of the Tat translocation system in the transition process from a free-living to a parasitic lifestyle in these α-proteobacteria

    A Phylometagenomic Exploration of Oceanic Alphaproteobacteria Reveals Mitochondrial Relatives Unrelated to the SAR11 Clade

    Get PDF
    BACKGROUND: According to the endosymbiont hypothesis, the mitochondrial system for aerobic respiration was derived from an ancestral Alphaproteobacterium. Phylogenetic studies indicate that the mitochondrial ancestor is most closely related to the Rickettsiales. Recently, it was suggested that Candidatus Pelagibacter ubique, a member of the SAR11 clade that is highly abundant in the oceans, is a sister taxon to the mitochondrial-Rickettsiales clade. The availability of ocean metagenome data substantially increases the sampling of Alphaproteobacteria inhabiting the oxygen-containing waters of the oceans that likely resemble the originating environment of mitochondria. METHODOLOGY/PRINCIPAL FINDINGS: We present a phylogenetic study of the origin of mitochondria that incorporates metagenome data from the Global Ocean Sampling (GOS) expedition. We identify mitochondrially related sequences in the GOS dataset that represent a rare group of Alphaproteobacteria, designated OMAC (Oceanic Mitochondria Affiliated Clade) as the closest free-living relatives to mitochondria in the oceans. In addition, our analyses reject the hypothesis that the mitochondrial system for aerobic respiration is affiliated with that of the SAR11 clade. CONCLUSIONS/SIGNIFICANCE: Our results allude to the existence of an alphaproteobacterial clade in the oxygen-rich surface waters of the oceans that represents the closest free-living relative to mitochondria identified thus far. In addition, our findings underscore the importance of expanding the taxonomic diversity in phylogenetic analyses beyond that represented by cultivated bacteria to study the origin of mitochondria
    corecore