231 research outputs found

    Standard operating procedure for computing pangenome trees

    Get PDF
    We present the pan-genome tree as a tool for visualizing similarities and differences between closely related microbial genomes within a species or genus. Distance between genomes is computed as a weighted relative Manhattan distance based on gene family presence/absence. The weights can be chosen with emphasis on groups of gene families conserved to various degrees inside the pan-genome. The software is available for free as an R-package

    Global transcriptome response in Lactobacillus sakei during growth on ribose

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Lactobacillus sakei </it>is valuable in the fermentation of meat products and exhibits properties that allow for better preservation of meat and fish. On these substrates, glucose and ribose are the main carbon sources available for growth. We used a whole-genome microarray based on the genome sequence of <it>L. sakei </it>strain 23K to investigate the global transcriptome response of three <it>L. sakei </it>strains when grown on ribose compared with glucose.</p> <p>Results</p> <p>The function of the common regulated genes was mostly related to carbohydrate metabolism and transport. Decreased transcription of genes encoding enzymes involved in glucose metabolism and the L-lactate dehydrogenase was observed, but most of the genes showing differential expression were up-regulated. Especially transcription of genes directly involved in ribose catabolism, the phosphoketolase pathway, and in alternative fates of pyruvate increased. Interestingly, the methylglyoxal synthase gene, which encodes an enzyme unique for <it>L. sakei </it>among lactobacilli, was up-regulated. Ribose catabolism seems closely linked with catabolism of nucleosides. The deoxyribonucleoside synthesis operon transcriptional regulator gene was strongly up-regulated, as well as two gene clusters involved in nucleoside catabolism. One of the clusters included a ribokinase gene. Moreover, <it>hprK </it>encoding the HPr kinase/phosphatase, which plays a major role in the regulation of carbon metabolism and sugar transport, was up-regulated, as were genes encoding the general PTS enzyme I and the mannose-specific enzyme II complex (EII<sup>man</sup>). Putative catabolite-responsive element (<it>cre</it>) sites were found in proximity to the promoter of several genes and operons affected by the change of carbon source. This could indicate regulation by a catabolite control protein A (CcpA)-mediated carbon catabolite repression (CCR) mechanism, possibly with the EII<sup>man </sup>being indirectly involved.</p> <p>Conclusions</p> <p>Our data shows that the ribose uptake and catabolic machinery in <it>L. sakei </it>is highly regulated at the transcription level. A global regulation mechanism seems to permit a fine tuning of the expression of enzymes that control efficient exploitation of available carbon sources.</p

    Microbial comparative pan-genomics using binomial mixture models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The size of the core- and pan-genome of bacterial species is a topic of increasing interest due to the growing number of sequenced prokaryote genomes, many from the same species. Attempts to estimate these quantities have been made, using regression methods or mixture models. We extend the latter approach by using statistical ideas developed for capture-recapture problems in ecology and epidemiology.</p> <p>Results</p> <p>We estimate core- and pan-genome sizes for 16 different bacterial species. The results reveal a complex dependency structure for most species, manifested as heterogeneous detection probabilities. Estimated pan-genome sizes range from small (around 2600 gene families) in <it>Buchnera aphidicola </it>to large (around 43000 gene families) in <it>Escherichia coli</it>. Results for <it>Echerichia coli </it>show that as more data become available, a larger diversity is estimated, indicating an extensive pool of rarely occurring genes in the population.</p> <p>Conclusion</p> <p>Analyzing pan-genomics data with binomial mixture models is a way to handle dependencies between genomes, which we find is always present. A bottleneck in the estimation procedure is the annotation of rarely occurring genes.</p

    Questioning the Quality of 16S rRNA Gene Sequences Derived From Human Gut Metagenome-Assembled Genomes

    Get PDF
    The recent introduction of metagenome-assembled genomes (MAGs) has marked a major milestone in the human gut microbiome field (Almeida et al., 2019; Nayfach et al., 2019; Pasolli et al., 2019). Such reference-free, de novo-assembled genomes (Hugerth et al., 2015) have revealed a wide range of hitherto uncultured microbial species in human gut samples. The significance of MAGs in unravelling human gut microbial diversity was supported by their overwhelming representation in a comprehensive human gut prokaryotic collection filtered by metagenome data dereplicated at 97.5% average nucleotide identity (ANI) (Hiseni et al., 2021). More than 90% of the collection consists of MAGs, while the rest of the collection mainly comprises RefSeq genomes (Figure 1A).publishedVersio

    Detection of divergent genes in microbial aCGH experiments

    Get PDF
    BACKGROUND: Array-based comparative genome hybridization (aCGH) is a tool for rapid comparison of genomes from different bacterial strains. The purpose of such analysis is to detect highly divergent or absent genes in a sample strain compared to an index strain. Development of methods for analyzing aCGH data has primarily focused on copy number abberations in cancer research. In microbial aCGH analyses, genes are typically ranked by log-ratios, and classification into divergent or present is done by choosing a cutoff log-ratio, either manually or by statistics calculated from the log-ratio distribution. As experimental settings vary considerably, it is not possible to develop a classical discriminant or statistical learning approach. METHODS: We introduce a more efficient method for analyzing microbial aCGH data using a finite mixture model and a data rotation scheme. Using the average posterior probabilities from the model fitted to log-ratios before and after rotation, we get a score for each gene, and demonstrate its advantages for ranking and detecting divergent genes with enlarged specificity and sensitivity. RESULTS: The procedure is tested and compared to other approaches on simulated data sets, as well as on four experimental validation data sets for aCGH analysis on fully sequenced strains of Staphylococcus aureus and Streptococcus pneumoniae. CONCLUSION: When tested on simulated data as well as on four different experimental validation data sets from experiments with only fully sequenced strains, our procedure out-competes the standard procedures of using a simple log-ratio cutoff for classification into present and divergent genes

    HumGut: a comprehensive human gut prokaryotic genomes collection filtered by metagenome data

    Get PDF
    Background A major bottleneck in the use of metagenome sequencing for human gut microbiome studies has been the lack of a comprehensive genome collection to be used as a reference database. Several recent efforts have been made to re-construct genomes from human gut metagenome data, resulting in a huge increase in the number of relevant genomes. In this work, we aimed to create a collection of the most prevalent healthy human gut prokaryotic genomes, to be used as a reference database, including both MAGs from the human gut and ordinary RefSeq genomes. Results We screened > 5,700 healthy human gut metagenomes for the containment of > 490,000 publicly available prokaryotic genomes sourced from RefSeq and the recently announced UHGG collection. This resulted in a pool of > 381,000 genomes that were subsequently scored and ranked based on their prevalence in the healthy human metagenomes. The genomes were then clustered at a 97.5% sequence identity resolution, and cluster representatives (30,691 in total) were retained to comprise the HumGut collection. Using the Kraken2 software for classification, we find superior performance in the assignment of metagenomic reads, classifying on average 94.5% of the reads in a metagenome, as opposed to 86% with UHGG and 44% when using standard Kraken2 database. A coarser HumGut collection, consisting of genomes dereplicated at 95% sequence identity—similar to UHGG, classified 88.25% of the reads. HumGut, half the size of standard Kraken2 database and directly comparable to the UHGG size, outperforms them both. Conclusions The HumGut collection contains > 30,000 genomes clustered at a 97.5% sequence identity resolution and ranked by human gut prevalence. We demonstrate how metagenomes from IBD-patients map equally well to this collection, indicating this reference is relevant also for studies well outside the metagenome reference set used to obtain HumGut. All data and metadata, as well as helpful code, are available at http://arken.nmbu.no/~larssn/humgut/.publishedVersio

    Assessing time dependent changes in microbial composition of biological crime scene traces using microbial RNA markers

    Full text link
    Current body fluid identification methods do not reveal any information about the time since deposition (TsD) of biological traces, even though determining the age of traces could be crucial for the investigative process. To determine the utility of microbial RNA markers for TsD estimation, we examined RNA sequencing data from five forensically relevant body fluids (blood, menstrual blood, saliva, semen, and vaginal secretion) over seven time points, ranging from fresh to 1.5 years. One set of samples was stored indoors while another was exposed to outdoor conditions. In outdoor samples, we observed a consistent compositional shift, occurring after 4 weeks: this shift was characterized by an overall increase in non-human eukaryotic RNA and an overall decrease in prokaryotic RNA. In depth analyses showed a high fraction of tree, grass and fungal signatures, which are characteristic for the environment the samples were exposed to. When examining the prokaryotic fraction in more detail, three bacterial phyla were found to exhibit the largest changes in abundance, namely Actinobacteria, Proteobacteria and Firmicutes. More detailed analyses at the order level were done using a Lasso regression analysis to find a predictive subset of bacterial taxa. We found 26 bacterial orders to be indicative of sample age. Indoor samples did not reveal such a clear compositional change at the domain level: eukaryotic and prokaryotic abundance remained relatively stable across the assessed time period. Nonetheless, a Lasso regression analysis identified 32 bacterial orders exhibiting clear changes over time, enabling the prediction of TsD. For both indoor and outdoor samples, a larger number (around 60%) of the bacterial orders identified as indicative of TsD are part of the Actinobacteria, Proteobacteria and Firmicutes. In summary, we found that the observed changes across time are not primarily due to changes associated with body fluid specific bacteria but mostly due to accumulation of bacteria from the environment. Orders of these environmental bacteria could be evaluated for TsD prediction, considering the location and environment of the crime scene. However, further studies are needed to verify these findings, determine the applicability across samples, replicates, donors, and other variables, and also to further assess the effect of different seasons and locations on the samples

    Rapid Succession of Actively Transcribing Denitrifier Populations in Agricultural Soil During an Anoxic Spell

    Get PDF
    Denitrification allows sustained respiratory metabolism during periods of anoxia, an advantage in soils with frequent anoxic spells. However, the gains may be more than evened out by the energy cost of producing the denitrification machinery, particularly if the anoxic spell is short. This dilemma could explain the evolution of different regulatory phenotypes observed in model strains, such as sequential expression of the four denitrification genes needed for a complete reduction of nitrate to N2, or a “bet hedging” strategy where all four genes are expressed only in a fraction of the cells. In complex environments such strategies would translate into progressive onset of transcription by the members of the denitrifying community. We exposed soil microcosms to anoxia, sampled for amplicon sequencing of napA/narG, nirK/nirS, and nosZ genes and transcripts after 1, 2 and 4 h, and monitored the kinetics of NO, N2O, and N2. The cDNA libraries revealed a succession of transcribed genes from active denitrifier populations, which probably reflects various regulatory phenotypes in combination with cross-talks via intermediates (NO2−, NO) produced by the “early onset” denitrifying populations. This suggests that the regulatory strategies observed in individual isolates are also displayed in complex communities, and pinpoint the importance for successive sampling when identifying active key player organisms

    Comparative genomics of Enterococcus faecalis from healthy Norwegian infants

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Enterococcus faecalis</it>, traditionally considered a harmless commensal of the intestinal tract, is now ranked among the leading causes of nosocomial infections. In an attempt to gain insight into the genetic make-up of commensal <it>E. faecalis</it>, we have studied genomic variation in a collection of community-derived <it>E. faecalis </it>isolated from the feces of Norwegian infants.</p> <p>Results</p> <p>The <it>E. faecalis </it>isolates were first sequence typed by multilocus sequence typing (MLST) and characterized with respect to antibiotic resistance and properties associated with virulence. A subset of the isolates was compared to the vancomycin resistant strain <it>E. faecalis </it>V583 (V583) by whole genome microarray comparison (comparative genomic hybridization (CGH)). Several of the putative enterococcal virulence factors were found to be highly prevalent among the commensal baby isolates. The genomic variation as observed by CGH was less between isolates displaying the same MLST sequence type than between isolates belonging to different evolutionary lineages.</p> <p>Conclusion</p> <p>The variations in gene content observed among the investigated commensal <it>E. faecalis </it>is comparable to the genetic variation previously reported among strains of various origins thought to be representative of the major <it>E. faecalis </it>lineages. Previous MLST analysis of <it>E. faecalis </it>have identified so-called high-risk enterococcal clonal complexes (HiRECC), defined as genetically distinct subpopulations, epidemiologically associated with enterococcal infections. The observed correlation between CGH and MLST presented here, may offer a method for the identification of lineage-specific genes, and may therefore add clues on how to distinguish pathogenic from commensal <it>E. faecalis</it>. In this work, information on the core genome of <it>E. faecalis </it>is also substantially extended.</p
    corecore