97 research outputs found

    Amino Acid Usage Is Asymmetrically Biased in AT- and GC-Rich Microbial Genomes.

    Get PDF
    INTRODUCTION: Genomic base composition ranges from less than 25% AT to more than 85% AT in prokaryotes. Since only a small fraction of prokaryotic genomes is not protein coding even a minor change in genomic base composition will induce profound protein changes. We examined how amino acid and codon frequencies were distributed in over 2000 microbial genomes and how these distributions were affected by base compositional changes. In addition, we wanted to know how genome-wide amino acid usage was biased in the different genomes and how changes to base composition and mutations affected this bias. To carry this out, we used a Generalized Additive Mixed-effects Model (GAMM) to explore non-linear associations and strong data dependences in closely related microbes; principal component analysis (PCA) was used to examine genomic amino acid- and codon frequencies, while the concept of relative entropy was used to analyze genomic mutation rates. RESULTS: We found that genomic amino acid frequencies carried a stronger phylogenetic signal than codon frequencies, but that this signal was weak compared to that of genomic %AT. Further, in contrast to codon usage bias (CUB), amino acid usage bias (AAUB) was differently distributed in AT- and GC-rich genomes in the sense that AT-rich genomes did not prefer specific amino acids over others to the same extent as GC-rich genomes. AAUB was also associated with relative entropy; genomes with low AAUB contained more random mutations as a consequence of relaxed purifying selection than genomes with higher AAUB. CONCLUSION: Genomic base composition has a substantial effect on both amino acid- and codon frequencies in bacterial genomes. While phylogeny influenced amino acid usage more in GC-rich genomes, AT-content was driving amino acid usage in AT-rich genomes. We found the GAMM model to be an excellent tool to analyze the genomic data used in this study

    Genomic variation landscape of the human gut microbiome

    Get PDF
    While large-scale efforts have rapidly advanced the understanding and practical impact of human genomic variation, the latter is largely unexplored in the human microbiome. We therefore developed a framework for metagenomic variation analysis and applied it to 252 fecal metagenomes of 207 individuals from Europe and North America. Using 7.4 billion reads aligned to 101 reference species, we detected 10.3 million single nucleotide polymorphisms (SNPs), 107,991 short indels, and 1,051 structural variants. The average ratio of non-synonymous to synonymous polymorphism rates of 0.11 was more variable between gut microbial species than across human hosts. Subjects sampled at varying time intervals exhibited individuality and temporal stability of SNP variation patterns, despite considerable composition changes of their gut microbiota. This implies that individual-specific strains are not easily replaced and that an individual might have a unique metagenomic genotype, which may be exploitable for personalized diet or drug intake

    SalmoNet, an integrated network of ten Salmonella enterica strains reveals common and distinct pathways to host adaptation

    Get PDF
    Salmonella enterica is a prominent bacterial pathogen with implications on human and animal health. Salmonella serovars could be classified as gastro-intestinal or extra-intestinal. Genome-wide comparisons revealed that extra-intestinal strains are closer relatives of gastro-intestinal strains than to each other indicating a parallel evolution of this trait. Given the complexity of the differences, a systems-level comparison could reveal key mechanisms enabling extra-intestinal serovars to cause systemic infections. Accordingly, in this work, we introduce a unique resource, SalmoNet, which combines manual curation, high-throughput data and computational predictions to provide an integrated network for Salmonella at the metabolic, transcriptional regulatory and protein-protein interaction levels. SalmoNet provides the networks separately for five gastro-intestinal and five extra-intestinal strains. As a multi-layered, multi-strain database containing experimental data, SalmoNet is the first dedicated network resource for Salmonella. It comprehensively contains interactions between proteins encoded in Salmonella pathogenicity islands, as well as regulatory mechanisms of metabolic processes with the option to zoom-in and analyze the interactions at specific loci in more detail. Application of SalmoNet is not limited to strain comparisons as it also provides a Salmonella resource for biochemical network modeling, host-pathogen interaction studies, drug discovery, experimental validation of novel interactions, uncovering new pathological mechanisms from emergent properties and epidemiological studies. SalmoNet is available at http://salmonet.org

    Towards the reconstruction of integrated genome-scale models of metabolism and gene expression

    Get PDF
    The reconstruction of integrated genome-scale models of metabolism and gene expression has been a challenge for a while now. In fact, various methods that allow integrating reconstructions of Transcriptional Regulatory Networks, gene expression data or both into Genome-Scale Metabolic Models have been proposed. Several of these methods are surveyed in this article, which allowed identifying their strengths and weaknesses concerning the reconstruction of integrated models for multiple prokaryotic organisms. Additionally, the main resources of regulatory information were also surveyed, as the existence of novel sources of regulatory information and gene expression data may contribute for the improvement of methodologies referred herein.This study was supported by the Portuguese Foundation for Science andTechnology (FCT) under the scope of the strategic funding of UID/BIO/04469/2019 unit andBioTecNorte operation (NORTE-01-0145-FEDER-000004) funded by the European RegionalDevelopment Fund under the scope of Norte2020-Programa Operacional Regional do Norte. Fernando Cruz holds a doctoral fellowship (SFRH/BD/139198/2018) funded by the FCT. The authors thank project SHIKIFACTORY100 - Modular cell factories for the production of 100 compounds from the shikimate pathway (814408) funded by the European Commission.info:eu-repo/semantics/publishedVersio

    Comparative genomics of metabolic capacities of regulons controlled by cis-regulatory RNA motifs in bacteria

    Get PDF
    BACKGROUND: In silico comparative genomics approaches have been efficiently used for functional prediction and reconstruction of metabolic and regulatory networks. Riboswitches are metabolite-sensing structures often found in bacterial mRNA leaders controlling gene expression on transcriptional or translational levels. An increasing number of riboswitches and other cis-regulatory RNAs have been recently classified into numerous RNA families in the Rfam database. High conservation of these RNA motifs provides a unique advantage for their genomic identification and comparative analysis. RESULTS: A comparative genomics approach implemented in the RegPredict tool was used for reconstruction and functional annotation of regulons controlled by RNAs from 43 Rfam families in diverse taxonomic groups of Bacteria. The inferred regulons include ~5200 cis-regulatory RNAs and more than 12000 target genes in 255 microbial genomes. All predicted RNA-regulated genes were classified into specific and overall functional categories. Analysis of taxonomic distribution of these categories allowed us to establish major functional preferences for each analyzed cis-regulatory RNA motif family. Overall, most RNA motif regulons showed predictable functional content in accordance with their experimentally established effector ligands. Our results suggest that some RNA motifs (including thiamin pyrophosphate and cobalamin riboswitches that control the cofactor metabolism) are widespread and likely originated from the last common ancestor of all bacteria. However, many more analyzed RNA motifs are restricted to a narrow taxonomic group of bacteria and likely represent more recent evolutionary innovations. CONCLUSIONS: The reconstructed regulatory networks for major known RNA motifs substantially expand the existing knowledge of transcriptional regulation in bacteria. The inferred regulons can be used for genetic experiments, functional annotations of genes, metabolic reconstruction and evolutionary analysis. The obtained genome-wide collection of reference RNA motif regulons is available in the RegPrecise database (http://regprecise.lbl.gov/)

    FITBAR: a web tool for the robust prediction of prokaryotic regulons

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The binding of regulatory proteins to their specific DNA targets determines the accurate expression of the neighboring genes. The <it>in silico </it>prediction of new binding sites in completely sequenced genomes is a key aspect in the deeper understanding of gene regulatory networks. Several algorithms have been described to discriminate against false-positives in the prediction of new binding targets; however none of them has been implemented so far to assist the detection of binding sites at the genomic scale.</p> <p>Results</p> <p>FITBAR (Fast Investigation Tool for Bacterial and Archaeal Regulons) is a web service designed to identify new protein binding sites on fully sequenced prokaryotic genomes. This tool consists in a workbench where the significance of the predictions can be compared using different statistical methods, a feature not found in existing resources. The Local Markov Model and the Compound Importance Sampling algorithms have been implemented to compute the P-value of newly discovered binding sites. In addition, FITBAR provides two optimized genomic scanning algorithms using either log-odds or entropy-weighted position-specific scoring matrices. Other significant features include the production of a detailed genomic context map for each detected binding site and the export of the search results in spreadsheet and portable document formats. FITBAR discovery of a high affinity <it>Escherichia coli </it>NagC binding site was validated experimentally <it>in vitro </it>as well as <it>in vivo </it>and published.</p> <p>Conclusions</p> <p>FITBAR was developed in order to allow fast, accurate and statistically robust predictions of prokaryotic regulons. This feature constitutes the main advantage of this web tool over other matrix search programs and does not impair its performance. The web service is available at <url>http://archaea.u-psud.fr/fitbar</url>.</p

    Repeated, Selection-Driven Genome Reduction of Accessory Genes in Experimental Populations

    Get PDF
    Genome reduction has been observed in many bacterial lineages that have adapted to specialized environments. The extreme genome degradation seen for obligate pathogens and symbionts appears to be dominated by genetic drift. In contrast, for free-living organisms with reduced genomes, the dominant force is proposed to be direct selection for smaller, streamlined genomes. Most variation in gene content for these free-living species is of “accessory” genes, which are commonly gained as large chromosomal islands that are adaptive for specialized traits such as pathogenicity. It is generally unclear, however, whether the process of accessory gene loss is largely driven by drift or selection. Here we demonstrate that selection for gene loss, and not a shortened genome, per se, drove massive, rapid reduction of accessory genes. In just 1,500 generations of experimental evolution, 80% of populations of Methylobacterium extorquens AM1 experienced nearly parallel deletions removing up to 10% of the genome from a megaplasmid present in this strain. The absence of these deletion events in a mutation accumulation experiment suggested that selection, rather than drift, has dominated the process. Reconstructing these deletions confirmed that they were beneficial in their selective regimes, but led to decreased performance in alternative environments. These results indicate that selection can be crucial in eliminating unnecessary genes during the early stages of adaptation to a specialized environment

    Selection in Coastal Synechococcus (Cyanobacteria) Populations Evaluated from Environmental Metagenomes

    Get PDF
    Environmental metagenomics provides snippets of genomic sequences from all organisms in an environmental sample and are an unprecedented resource of information for investigating microbial population genetics. Current analytical methods, however, are poorly equipped to handle metagenomic data, particularly of short, unlinked sequences. A custom analytical pipeline was developed to calculate dN/dS ratios, a common metric to evaluate the role of selection in the evolution of a gene, from environmental metagenomes sequenced using 454 technology of flow-sorted populations of marine Synechococcus, the dominant cyanobacteria in coastal environments. The large majority of genes (98%) have evolved under purifying selection (dN/dS<1). The metagenome sequence coverage of the reference genomes was not uniform and genes that were highly represented in the environment (i.e. high read coverage) tended to be more evolutionarily conserved. Of the genes that may have evolved under positive selection (dN/dS>1), 77 out of 83 (93%) were hypothetical. Notable among annotated genes, ribosomal protein L35 appears to be under positive selection in one Synechococcus population. Other annotated genes, in particular a possible porin, a large-conductance mechanosensitive channel, an ATP binding component of an ABC transporter, and a homologue of a pilus retraction protein had regions of the gene with elevated dN/dS. With the increasing use of next-generation sequencing in metagenomic investigations of microbial diversity and ecology, analytical methods need to accommodate the peculiarities of these data streams. By developing a means to analyze population diversity data from these environmental metagenomes, we have provided the first insight into the role of selection in the evolution of Synechococcus, a globally significant primary producer

    Comparative genomics of prevaccination and modern Bordetella pertussis strains

    Get PDF
    Contains fulltext : 89571.pdf (publisher's version ) (Open Access)BACKGROUND: Despite vaccination since the 1950s, pertussis has persisted and resurged. It remains a major cause of infant death worldwide and is the most prevalent vaccine-preventable disease in developed countries. The resurgence of pertussis has been associated with the expansion of Bordetella pertussis strains with a novel allele for the pertussis toxin (Ptx) promoter, ptxP3, which have replaced resident ptxP1 strains. Compared to ptxP1 strains, ptxP3 produce more Ptx resulting in increased virulence and immune suppression. To elucidate how B. pertussis has adapted to vaccination, we compared genome sequences of two ptxP3 strains with four strains isolated before and after the introduction vaccination. RESULTS: The distribution of SNPs in regions involved in transcription and translation suggested that changes in gene regulation play an important role in adaptation. No evidence was found for acquisition of novel genes. Modern strains differed significantly from prevaccination strains, both phylogenetically and with respect to particular alleles. The ptxP3 strains were found to have diverged recently from modern ptxP1 strains. Differences between ptxP3 and modern ptxP1 strains included SNPs in a number of pathogenicity-associated genes. Further, both gene inactivation and reactivation was observed in ptxP3 strains relative to modern ptxP1 strains. CONCLUSIONS: Our work suggests that B. pertussis adapted by successive accumulation of SNPs and by gene (in)activation. In particular changes in gene regulation may have played a role in adaptation
    corecore