10 research outputs found

    Was the last bacterial common ancestor a monoderm after all?

    Full text link
    The very nature of the last bacterial common ancestor (LBCA), in particular the characteristics of its cell wall, is a critical issue to understand the evolution of life on earth. Although knowledge of the relationships between bacterial phyla has made progress with the advent of phylogenomics, many questions remain, including on the appearance or disappearance of the outer membrane of diderm bacteria (also called Gram-negative bacteria). The phylogenetic transition between monoderm (Gram-positive bacteria) and diderm bacteria, and the associated peptidoglycan expansion or reduction, requires clarification. Herein, using a phylogenomic tree of cultivated and characterized Bacteria as an evolutionary framework and a literature review of their cell-wall characteristics, we used Bayesian ancestral state reconstruction to infer the cell-wall architecture of the LBCA. With the same phylogenomic tree, we further revisited the evolution of the division and cell-wall synthesis (dcw) gene cluster using homology- and model-based methods. Finally, extensive similarity searches were carried out to determine the phylogenetic distribution of the genes involved with the biosynthesis of the outer membrane in diderm bacteria. Quite unexpectedly, our analyses suggest that all cultivated and characterized bacteria might have evolved from a common ancestor with a monoderm cell-wall architecture. If true, this would indicate that the appearance of the outer membrane was not a unique event and that selective forces have led to the repeated adoption of such an architecture. Due to the lack of phenotypic information, our methodology cannot be applied to all extant bacteria. Consequently, our conclusion might change once enough information is made available to allow the use of an even more diverse organism selection

    The GEN-ERA toolbox: unified and reproducible workflows for research in microbial genomics

    Full text link
    Microbial culture collections play a key role in taxonomy by studying the diversity of their strains and providing well-characterized biological material to the scientific community for fundamental and applied research. These microbial resource centers thus need to implement new standards in species delineation, including whole-genome sequencing and phylogenomics. In this context, the genomic needs of the Belgian Coordinated Collections of Microorganisms (BCCM) were studied, resulting in the GEN-ERA toolbox, a unified cluster of bioinformatic workflows dedicated to both bacteria and small eukaryotes (e.g., yeasts). This public toolbox is designed for researchers without a specific training in bioinformatics (launched by a single command line). Hence, it facilitates all steps from genome downloading and quality assessment, including genomic contamination estimation, to tree reconstruction. It also offers workflows for average nucleotide identity comparisons and metabolic modeling. All the workflows are based on Singularity containers and Nextflow to increase reproducibility. The GEN-ERA toolbox can be used to infer completely reproducible comparative genomic and metabolic analyses on prokaryotes and small eukaryotes. Although designed for routine bioinformatics of culture collections, it can also be used by all researchers interested in microbial taxonomy, as exemplified by our case study on Gloeobacterales (Cyanobacteria). This study is published at https://doi.org/10.1093/gigascience/giad022GENER

    The GEN-ERA toolbox: unified and reproducible workflows for research in microbial genomics.

    Full text link
    peer reviewed[en] BACKGROUND: Microbial culture collections play a key role in taxonomy by studying the diversity of their strains and providing well-characterized biological material to the scientific community for fundamental and applied research. These microbial resource centers thus need to implement new standards in species delineation, including whole-genome sequencing and phylogenomics. In this context, the genomic needs of the Belgian Coordinated Collections of Microorganisms were studied, resulting in the GEN-ERA toolbox. The latter is a unified cluster of bioinformatic workflows dedicated to both bacteria and small eukaryotes (e.g., yeasts). FINDINGS: This public toolbox allows researchers without a specific training in bioinformatics to perform robust phylogenomic analyses. Hence, it facilitates all steps from genome downloading and quality assessment, including genomic contamination estimation, to tree reconstruction. It also offers workflows for average nucleotide identity comparisons and metabolic modeling. TECHNICAL DETAILS: Nextflow workflows are launched by a single command and are available on the GEN-ERA GitHub repository (https://github.com/Lcornet/GENERA). All the workflows are based on Singularity containers to increase reproducibility. TESTING: The toolbox was developed for a diversity of microorganisms, including bacteria and fungi. It was further tested on an empirical dataset of 18 (meta)genomes of early branching Cyanobacteria, providing the most up-to-date phylogenomic analysis of the Gloeobacterales order, the first group to diverge in the evolutionary tree of Cyanobacteria. CONCLUSION: The GEN-ERA toolbox can be used to infer completely reproducible comparative genomic and metabolic analyses on prokaryotes and small eukaryotes. Although designed for routine bioinformatics of culture collections, it can also be used by all researchers interested in microbial taxonomy, as exemplified by our case study on Gloeobacterales

    A journey into prokaryotic cell-wall evolution: the two half-brothers Murein and Pseudomurein

    Full text link
    Prokaryotes (i.e., single-celled organisms without a nucleus) are currently divided into two domains: Bacteria and Archaea. One of the major differences between the two domains lies in their cell wall. Indeed, although bacteria have mostly peptidoglycan (also known as murein) in their cell wall, most archaea have a cell wall composed of a protein layer assembled into a crystalline network named S-layer (Surface layer). However, there exist two orders of Euryarchaeota, the Methanopyrales and Methanobacteriales, which possess in their wall a polymer structurally analogous to peptidoglycan. Therefore, this polymer was called pseudomurein. The objective of this thesis was to study the evolution of different gene families involved in the biosynthesis of peptidoglycan and pseudomurein, in order to determine if these two polymers share common genetic determinants. To conduct our analyses, we exploited more than 80,000 bacterial genomes and more than 800 archaeal genomes, all collected from the NCBI RefSeq database. However, at the beginning of our work, there were indications that RefSeq, in spite of its extensive curation, presents problems of genomic contamination that could bias the interpretation of phylogenetic results. As a first step, we developed a contamination detection software called Physeter. This software was then used to detect potential genomic contamination in prokaryotic genomes from RefSeq. Through this study, we have shown that about 0.9% of the bacterial genomes in RefSeq have a contamination rate of at least 5%. Although RefSeq provides a good coverage of prokaryotic diversity, it suffers from sampling biases. In order to design and test bioinformatics strategies to improve the informativeness of phylogenies by reducing redundancies due to the inclusion of many closely related strains, we chose to prototype our methods on the class D beta-lactamase protein family. These are enzymes produced by bacteria to resist beta-lactam antibiotics, a family of antibiotics that target peptidoglycan synthesis and lead to cell lysis. Here, we conducted a comprehensive phylogenetic and bioinformatic study of this protein family. Following these results, we expressed in Escherichia coli ten newly identified protein sequences and thus showed that environmental bacteria (including those never exposed to human-made antibiotics) constitute a large reservoir of resistance genes against antimicrobial agents. Finally, using a decontaminated version of RefSeq and bioinformatics methods to optimize its exploitation, we identified different gene families potentially involved in archaeal pseudomurein biosynthesis, on which we applied a bioinformatic pipeline similar to the one implemented with class D beta-lactamases. Some of the identified genes are homologous to those involved in peptidoglycan biosynthesis, such as Mur ligases or the transmembrane protein MraY. We have shown that these genes are clustered in two syntenic regions in the genomes of Methanopyrales and Methanobacteriales. Furthermore, our phylogenetic analyses suggest that the archaeal Mur ligases result from horizontal gene transfers from one or more ancient bacterial lineages. Based on all these results, we proposed that the hypothesis that the acquisition of bacterial genes in a common ancestor of the Methanopyrales and Methanobacteriales has led to the origin of the archaeal pseudomurein.Les procaryotes, organismes unicellulaires ne présentant pas de noyau, sont actuellement divisés en deux domaines : Bactéries et Archées. L’une des différences majeures entre les deux domaines réside dans leur paroi cellulaire. En effet, bien que les bactéries ont majoritairement du peptidoglycane (aussi appelé muréine) dans leur paroi, la plupart des archées ont une paroi composée d’une couche protéique assemblée en un réseau cristallin, que l’on nomme couche S. Cependant, il existe deux ordres d’Euryarchaeota, les Methanopyrales et Methanobacteriales, qui possèdent dans leur paroi un polymère structurellement analogue au peptidoglycane. Par conséquent, ce polymère a été nommé pseudomuréine. L’objectif de cette thèse était d’étudier l’évolution de différentes familles de gènes impliquées dans la synthèse du peptidoglycane et de la pseudomuréine, afin de déterminer si les deux polymères partagent des déterminants génétiques communs. Pour conduire ces analyses, nous avons exploité plus de 80 000 génomes bactériens et plus de 800 génomes archéens provenant tous de la base de données RefSeq du NCBI. Or, au début de notre travail, un faisceau d’indices laissait penser que RefSeq, en dépit de sa curation extensive, présente des problèmes de contamination des génomes pouvant fausser l’interprétation des résultats phylogénétiques. Dans un premier temps, nous avons donc développé un programme de détection des contaminations baptisé Physeter. Celui-ci a ensuite été utilisé pour détecter les potentielles contaminations génomiques présentes dans les génomes procaryotes. Par cette étude, nous avons montré qu’environ 0.9% des génomes bactériens de RefSeq ont un taux de contamination d’au moins 5%. Par ailleurs, si RefSeq offre une bonne couverture de la diversité procaryotique, elle souffre de biais d’échantillonnage. Dans le but de concevoir et tester des stratégies bioinformatiques pour améliorer l’informativité des phylogénies en réduisant les redondances dues à l’inclusion de nombreuses souches très apparentées, nous avons choisi de prototyper nos méthodes sur la famille des bêta-lactamases de classe D. Ces dernières sont des enzymes produites par les bactéries pour lutter contre les antibiotiques à noyau bêta-lactame, une famille d'antibiotiques qui ciblent la synthèse du peptidoglycane et provoquent la lyse de la cellule. Nous avons conduit une étude phylogénétique et bioinformatique complète de cette famille. A la suite de ces résultats, nous avons exprimé dans Escherichia coli dix séquences de protéines nouvellement identifiées et montré que les bactéries environnementales (même non-exposées aux antibiotiques d’origine anthropique) constituent un grand réservoir de gènes de résistance contre les agents antimicrobiens. Enfin, fort d’une version décontaminée de RefSeq et des méthodes bioinformatiques permettant d’en optimiser l’exploitation, nous avons identifié différentes familles de gènes potentiellement impliquées dans la synthèse de la pseudomuréine archéenne. Certains des gènes identifiés sont homologues à ceux impliqués dans la synthèse du peptidoglycane, comme des Mur ligases ou la protéine transmembranaire MraY. Nous avons montré que ces gènes sont regroupés dans deux régions synténiques dans les génomes de Methanopyrales et Methanobacteriales. De plus, nos analyses phylogénétiques suggèrent que les Mur ligases archéennes sont le résultat de transferts de gènes horizontaux depuis une ou plusieurs anciennes lignées bactériennes. En combinant tous les résultats obtenus, nous avons proposé l’hypothèse à vérifier que c’est l’acquisition de gènes bactériens par un ancêtre commun des Methanopyrales et des Methanobacteriales qui a entraîné l’apparition de la pseudomuréine archéenne

    Datasets for Lupo et al. (2022) Origin and Evolution of Pseudomurein Biosynthetic Gene Clusters

    No full text
    Lupo et al. 2022 Pseudomurein: Archive content for v1 Overview ... 61 directories, 769 files README.md: this file. command-line.sh: examples of bash commands to use or generate the files stored in this archive. 10-archaeal-proteomes This directory contains an archive (.tar.gz) consisting of FASTA (.faa) files of the 10 organisms used by OrthoFinder. alphafold List of ZIP (.zip) files containing the raw results of AlphaFold predictions and Consurf results. archaeal-genomes This directory contains an archive (.tar.gz) consisting of 819 FASTA (.faa) files corresponding to the archaeal database. bacterial-genomes This directory contains an archive (.tar.gz) consisting of 598 FASTA (.faa) files corresponding to the bacterial database. config This directory contains two configuration files (.yaml) used by the classify-ali.pl perl script from Bio::MUST modules: classifier.yaml is the configuration file used to filter the 6,321 orthologous groups (OGs). five_org_classifier.yaml is the configuration file used to filter the OGs after the round of Forty-Two TBLASTN. BLASTP List of configuration files to run Forty-Two BLASTP (FILTERING OF CANDIDATE PROTEINS; see figure S1 in the main manuscript). TBLASTN List of configuration files to run Forty-Two TBLASTN (FILTERING OF CANDIDATE PROTEINS; see figure S1 in the main manuscript). NCBI_CCD List of HMM (.hmm) profile files downloaded from NCBI CDD database and corresponding HMM search (.hmms) files. ompapa-results Raw Ompa-Pa results associated with NCBI CDD HMM profiles. ompapa This directory contains a list of the OGs that have passed the taxonomic filter (see config sections). alignments List of alignments in FASTA format of the OGs included in the retained_OGs.lis file (see ompapa section). bacterial_db Results of ompapa analyses against the bacterial database. This directory contains two sub-directories: hmms that contains a list of HMM search (.hmms) files. ompapa-results that contains raw ompapa results. hmm_profiles List of HMM (.hmm) profiles of the OGs included in the retained_OGs.lis file (see ompapa section). prokaryotic_db Results of ompapa analyses against the prokaryotic database. This directory contains two sub-directories: hmms that contains a list of HMM search (.hmms) files. ompapa-results that contains raw ompapa results. orthologous-groups This directory contains an archive (.tar.gz) consisting of 6,321 FASTA (.fasta) files corresponding to the OGs generated by OrthoFinder. predictions This directory contains three sub-directories: interproscan that contains raw InterProScan results. signalp5 that contains raw SignalP5 results. tmhmm that contains raw TMHMM results. For the consolidated results, see Table S1 in the main manuscript. prokaryotic-genomes This directory contains a list of assembly accessions of the 80,490 organisms from the prokaryotic database. regulon This directory contains all the data needed to reproduce the ‘regulon pipeline’ (see supplementary data from the main manuscript). All the command lines are included in the command_line_regulon.sh file. scripts This directory contains various perl scripts used to generate some of the files stored in this archive (see command-line.sh file). taxdump Mirror of the NCBI Taxonomy used in this study (downloaded on 4th of May 2020). trees This directory contains three sub-directories: atp-grasp mray-family mur-family All the sub-directories contains tree (.tre) files presented in the main manuscript and corresponding raw IQ-TREE (.ckp.gz) results. They also contain various files needed to reproduce the phylogenetic analyses (see command-line.sh file). </p

    Contamination in Reference Sequence Databases: Time for Divide-and-Rule Tactics.

    Full text link
    Contaminating sequences in public genome databases is a pervasive issue with potentially far-reaching consequences. This problem has attracted much attention in the recent literature and many different tools are now available to detect contaminants. Although these methods are based on diverse algorithms that can sometimes produce widely different estimates of the contamination level, the majority of genomic studies rely on a single method of detection, which represents a risk of systematic error. In this work, we used two orthogonal methods to assess the level of contamination among National Center for Biotechnological Information Reference Sequence Database (RefSeq) bacterial genomes. First, we applied the most popular solution, CheckM, which is based on gene markers. We then complemented this approach by a genome-wide method, termed Physeter, which now implements a k-folds algorithm to avoid inaccurate detection due to potential contamination of the reference database. We demonstrate that CheckM cannot currently be applied to all available genomes and bacterial groups. While it performed well on the majority of RefSeq genomes, it produced dubious results for 12,326 organisms. Among those, Physeter identified 239 contaminated genomes that had been missed by CheckM. In conclusion, we emphasize the importance of using multiple methods of detection while providing an upgrade of our own detection tool, Physeter, which minimizes incorrect contamination estimates in the context of unavoidably contaminated reference databases

    An Extended Reservoir of Class-D Beta-Lactamases in Non-Clinical Bacterial Strains.

    Full text link
    peer reviewedBacterial genes coding for antibiotic resistance represent a major issue in the fight against bacterial pathogens. Among those, genes encoding beta-lactamases target penicillin and related compounds such as carbapenems, which are critical for human health. Beta-lactamases are classified into classes A, B, C, and D, based on their amino acid sequence. Class D enzymes are also known as OXA beta-lactamases, due to the ability of the first enzymes described in this class to hydrolyze oxacillin. While hundreds of class D beta-lactamases with different activity profiles have been isolated from clinical strains, their nomenclature remains very uninformative. In this work, we have carried out a comprehensive survey of a reference database of 80,490 genomes and identified 24,916 OXA-domain containing proteins. These were deduplicated and their representative sequences clustered into 45 non-singleton groups derived from a phylogenetic tree of 1,413 OXA-domain sequences, including five clusters that include the C-terminal domain of the BlaR membrane receptors. Interestingly, 801 known class D beta-lactamases fell into only 18 clusters. To probe the unknown diversity of the class, we selected 10 protein sequences in 10 uncharacterized clusters and studied the activity profile of the corresponding enzymes. A beta-lactamase activity could be detected for seven of them. Three enzymes (OXA-1089, OXA-1090 and OXA-1091) were active against oxacillin and two against imipenem. These results indicate that, as already reported, environmental bacteria constitute a large reservoir of resistance genes that can be transferred to clinical strains, whether through plasmid exchange or hitchhiking with the help of transposase genes. IMPORTANCE The transmission of genes coding for resistance factors from environmental to nosocomial strains is a major component in the development of bacterial resistance toward antibiotics. Our survey of class D beta-lactamase genes in genomic databases highlighted the high sequence diversity of the enzymes that are able to recognize and/or hydrolyze beta-lactam antibiotics. Among those, we could also identify new beta-lactamases that are able to hydrolyze carbapenems, one of the last resort antibiotic families used in human antimicrobial chemotherapy. Therefore, it can be expected that the use of this antibiotic family will fuel the emergence of new beta-lactamases into clinically relevant strains

    De Novo Transcriptome Meta-Assembly of the Mixotrophic Freshwater Microalga Euglena gracilis.

    Full text link
    Euglena gracilis is a well-known photosynthetic microeukaryote considered as the product of a secondary endosymbiosis between a green alga and a phagotrophic unicellular belonging to the same eukaryotic phylum as the parasitic trypanosomatids. As its nuclear genome has proven difficult to sequence, reliable transcriptomes are important for functional studies. In this work, we assembled a new consensus transcriptome by combining sequencing reads from five independent studies. Based on a detailed comparison with two previously released transcriptomes, our consensus transcriptome appears to be the most complete so far. Remapping the reads on it allowed us to compare the expression of the transcripts across multiple culture conditions at once and to infer a functionally annotated network of co-expressed genes. Although the emergence of meaningful gene clusters indicates that some biological signal lies in gene expression levels, our analyses confirm that gene regulation in euglenozoans is not primarily controlled at the transcriptional level. Regarding the origin of E. gracilis, we observe a heavily mixed gene ancestry, as previously reported, and rule out sequence contamination as a possible explanation for these observations. Instead, they indicate that this complex alga has evolved through a convoluted process involving much more than two partners

    The GEN-ERA toolbox: unified and reproducible workflows for research in microbial genomics

    Full text link
    editorial reviewedMicrobial culture collections play a key role in taxonomy by studying the diversity of their strains and providing well-characterized biological material to the scientific community for fundamental and applied research. These microbial resource centers thus need to implement new standards in species delineation, including whole-genome sequencing and phylogenomics. In this context, the genomic needs of the Belgian Coordinated Collections of Microorganisms (BCCM) were studied, resulting in the GEN-ERA toolbox, a unified cluster of bioinformatic workflows dedicated to both bacteria and small eukaryotes (e.g., yeasts). This public toolbox is designed for researchers without a specific training in bioinformatics (launched by a single command line). Hence, it facilitates all steps from genome downloading and quality assessment, including genomic contamination estimation, to tree reconstruction. It also offers workflows for average nucleotide identity comparisons and metabolic modeling. All the workflows are based on Singularity containers and Nextflow to increase reproducibility. The GEN-ERA toolbox can be used to infer completely reproducible comparative genomic and metabolic analyses on prokaryotes and small eukaryotes. Although designed for routine bioinformatics of culture collections, it can also be used by all researchers interested in microbial taxonomy, as exemplified by our case study on Gloeobacterales (Cyanobacteria)

    The GEN-ERA toolbox: unified and reproducible workflows for research in microbial genomics

    No full text
    Background: Microbial culture collections play a key role in taxonomy by studying the diversity of their strains and providing well- characterized biological material to the scientific community for fundamental and applied research. These microbial resource centers thus need to implement new standards in species delineation, including whole-genome sequencing and phylogenomics. In this con- text, the genomic needs of the Belgian Coordinated Collections of Microorganisms were studied, resulting in the GEN-ERA toolbox. The latter is a unified cluster of bioinformatic workflows dedicated to both bacteria and small eukaryotes (e.g., yeasts). Findings: This public toolbox allows researchers without a specific training in bioinformatics to perform robust phylogenomic anal- yses. Hence, it facilitates all steps from genome downloading and quality assessment, including genomic contamination estimation, to tree reconstruction. It also offers workflows for average nucleotide identity comparisons and metabolic modeling. Technical details: Nextflow workflows are launched by a single command and are available on the GEN-ERA GitHub repository (https: //github.com/Lcornet/GENERA). All the workflows are based on Singularity containers to increase reproducibility. Testing: The toolbox was developed for a diversity of microorganisms, including bacteria and fungi. It was further tested on an empirical dataset of 18 (meta)genomes of early branching Cyanobacteria, providing the most up-to-date phylogenomic analysis of the Gloeobacterales order, the first group to diverge in the evolutionary tree of Cyanobacteria. Conclusion: The GEN-ERA toolbox can be used to infer completely reproducible comparative genomic and metabolic analyses on prokaryotes and small eukaryotes. Although designed for routine bioinformatics of culture collections, it can also be used by all researchers interested in microbial taxonomy, as exemplified by our case study on Gloeobacterales
    corecore