20 research outputs found

    De novo extraction of microbial strains from metagenomes reveals intra-species niche partitioning

    Get PDF
    Background We introduce DESMAN for De novo Extraction of Strains from MetAgeNomes. Metagenome sequencing generates short reads from throughout the genomes of a microbial community. Increasingly large, multi-sample metagenomes, stratified in space and time are being generated from communities with thousands of species. Repeats result in fragmentary co-assemblies with potentially millions of contigs. Contigs can be binned into metagenome assembled genomes (MAGs) but strain level variation will remain. DESMAN identifies variants on core genes, then uses co-occurrence across samples to link variants into strain sequences and abundance profiles. These strain profiles are then searched for on non-core genes to determine the accessory genes present in each strain. Results We validated DESMAN on a synthetic twenty genome community with 64 samples. We could resolve the five E. coli strains present with 99.58% accuracy across core gene variable sites and their gene complement with 95.7% accuracy. Similarly, on real fecal metagenomes from the 2011 E. coli (STEC) O104:H4 outbreak, the outbreak strain was reconstructed with 99.8% core sequence accuracy. Application to an anaerobic digester metagenome time series reveals that strain level variation is endemic with 16 out of 26 MAGs (61.5%) examined exhibiting two strains. In almost all cases the strain proportions were not statistically different between replicate reactors, suggesting intra-species niche partitioning. The only exception being when the two strains had almost identical gene complement and, hence, functional capability. Conclusions DESMAN will provide a provide a powerful tool for de novo resolution of fine-scale variation in microbial communities. It is available as open source software from https://github.com/chrisquince/DESMAN

    A constrained NMF approach to analyze quantitative metagenomic data

    No full text
    In this paper, we propose a new method for inferring the metabolic potential of microbial ecosystems based on gene frequencies generated from shotgun metagenomic data. Our approach is based on Non-Negative Matrix Factorization with constraints accounting for prior biological knowledge of bacterial metabolism. The problem is solved using efficient accelerated projected gradient methods. The approach is illustrated on a toy model and on real data on fiber metabolism by the gut microbiota in humans. We show how this approach leads to the inference of biologically relevant gene clusters

    Inferring Aggregated Functional Traits from Metagenomic Data Using Constrained Non-negative Matrix Factorization: Application to Fiber Degradation in the Human Gut Microbiota

    No full text
    Whole Genome Shotgun (WGS) metagenomics is increasingly used to study the structure and functions of complex microbial ecosystems, both from the taxonomic and functional point of view. Gene inventories of otherwise uncultured microbial communities make the direct functional profiling of microbial communities possible. The concept of community aggregated trait has been adapted from environmental and plant functional ecology to the framework of microbial ecology. Community aggregated traits are quantified from WGS data by computing the abundance of relevant marker genes. They can be used to study key processes at the ecosystem level and correlate environmental factors and ecosystem functions. In this paper we propose a novel model based approach to infer combinations of aggregated traits characterizing specific ecosystemic metabolic processes. We formulate a model of these Combined Aggregated Functional Traits (CAFTs) accounting for a hierarchical structure of genes, which are associated on microbial genomes, further linked at the ecosystem level by complex co-occurrences or interactions. The model is completed with constraints specifically designed to exploit available genomic information, in order to favor biologically relevant CAFTs. The CAFTs structure, as well as their intensity in the ecosystem, is obtained by solving a constrained Non-negative Matrix Factorization (NMF) problem. We developed a multicriteria selection procedure for the number of CAFTs. We illustrated our method on the modelling of ecosystemic functional traits of fiber degradation by the human gut microbiota. We used 1408 samples of gene abundances from several high-throughput sequencing projects and found that four CAFTs only were needed to represent the fiber degradation potential. This data reduction highlighted biologically consistent functional patterns while providing a high quality preservation of the original data. Our method is generic and can be applied to other metabolic processes in the gut or in other ecosystems

    Hierarchical structure underlying metagenomic data.

    No full text
    <p>We consider a metabolic process in a microbial ecosystem involving a substrate U, metabolites V, X, Y, Z, T and the set of reactions U → V, V → X, X → Y, V → Z and U → T + X, respectively catalyzed by proteins synthesized by genes in KEGG Orthology (KO) groups <b>a</b>, <b>b</b>, <b>c</b>, <b>d</b> and <b>e</b>. Gene counts stem from an underlying hierarchical organization. Genes are associated within bacterial genomes (solid black lines) and through ecosystem level association patterns (green, red and blue ticked lines). In this example, the green and blue boxes can be interpreted as trophic chains corresponding to two distinct pathways for substrate degradation. The red box can be interpreted as an alternative to the green one, involving different bacterial groups, depending on the host diet or life history. Note that the red box involves (possibly several) species harbouring two copies of gene <b>a</b> in their genomes.</p

    An example of CAFT.

    No full text
    <p>The first 61 coordinates of the functional marker frequency vector given by the first line of <i>H</i>*, associated to simple sugar fermentation, is represented on the reaction graph. The color scale represents percentages of the maximum coordinate among the 61 (reaction 7). The reactions form coherent pathways. The 25 coordinates associated with hydrolysis are presented in the table on the right. The numbers indicate GH families the color are scaled as percentages of the maximum coordinate among the 25 (<i>GH</i>13).</p

    Four functional profiles for fibre and mucin metabolism in the human gut microbiome

    No full text
    International audienceBackground With the emergence of metagenomic data, multiple links between the gut microbiome and the host health have been shown. Deciphering these complex interactions require evolved analysis methods focusing on the microbial ecosystem functions. Despite the fact that host or diet-derived fibres are the most abundant nutrients available in the gut, the presence of distinct functional traits regarding fibre and mucin hydrolysis, fermentation and hydrogenotrophic processes has never been investigated. Results After manually selecting 91 KEGG orthologies and 33 glycoside hydrolases further aggregated in 101 functional descriptors representative of fibre and mucin degradation pathways in the gut microbiome, we used non-negative matrix factorization to mine metagenomic datasets. Four distinct metabolic profiles were further identified on a training set of 1153 samples and thoroughly validated on a large database of 2571 unseen samples from 5 external metagenomic cohorts. Profiles 1 and 2 are the main contributors to the fibre-degradation-related metagenome: they present contrasted involvement in fibre degradation and sugar metabolism and are differentially linked to dysbiosis, metabolic disease and inflammation. Profile 1 takes over Profile 2 inhealthy samples, and unbalance of these profiles characterize dysbiotic samples. Furthermore, high fibre diet favours a healthy balance between Profiles 1 and Profile 2. Profile 3 takes over Profile 2 during Crohn’s disease, inducing functional reorientations towards unusual metabolism such as fucose and H2S degradation or propionate, acetone and butanediol production. Profile 4 gathers under-represented functions, like methanogenesis. Two taxonomic makes up of the profiles were investigated, using either the covariation of 203 prevalent genomes or metagenomic species, both providing consistent results in line with their functional characteristics. This taxonomic characterization showed that Profiles 1 and 2 were respectively mainly composed of bacteria from the phyla Bacteroidetes and Firmicutes while Profile 3 is representative of Proteobacteria and Profile 4 of methanogens.Conclusions Integrating anaerobic microbiology knowledge with statistical learning can narrow down the metagenomic analysis to investigate functional profiles. Applying this approach to fibre degradation in the gut ended with 4 distinct functional profiles that can be easily monitored as markers of diet, dysbiosis, inflammation and disease

    Constraints on the functional markers associated to the production and consumption of intracellular metabolites in the catabolic pathway from simple sugars to SCFA and methane.

    No full text
    <p>Constraints on the functional markers associated to the production and consumption of intracellular metabolites in the catabolic pathway from simple sugars to SCFA and methane.</p

    Combined Aggregated Functional Traits.

    No full text
    <p>The gene association patterns in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005252#pcbi.1005252.g001" target="_blank">Fig 1</a> correspond to stable structures, representing different ways to fulfill the metabolic function of interest at the ecosystem level. At the bottom of each box, vectors <i>h</i><sub>1</sub>, <i>h</i><sub>2</sub> and <i>h</i><sub>3</sub> represent functional markers abundances. We call these vectors Combined Aggregated Functional Traits (CAFTs). They should be found in all samples, possibly in varying proportions. Sample 1 is decomposed as <i>A</i><sub>1</sub> = 3<i>h</i><sub>1</sub> + 2<i>h</i><sub>2</sub> + <i>h</i><sub>3</sub> and sample <i>n</i> as <i>A</i><sub><i>n</i></sub> = <i>h</i><sub>1</sub> + 2<i>h</i><sub>2</sub> + 3<i>h</i><sub>3</sub>.</p
    corecore