10 research outputs found

    Abundance-based reconstitution of microbial pan-genomes from whole-metagenome shotgun sequencing data : Application to the study the human gut microbiota

    No full text
    L’avĂšnement du sĂ©quençage mĂ©tagĂ©nomique alĂ©atoire a rĂ©volutionnĂ© la microbiologie en permettant la caractĂ©risation sans culture prĂ©alable de communautĂ©s microbiennes complexes telles que le microbiote intestinal humain. Des outils bioinformatiques rĂ©cemment dĂ©veloppĂ©s atteignent une rĂ©solution au niveau de la souche en recensant des gĂšnes accessoires ou en capturant des variants nuclĂ©otidiques (SNPs). Toutefois, ces outils sont limitĂ©s par l’étendue des gĂ©nomes de rĂ©fĂ©rence disponibles qui sont loin de couvrir toute la variabilitĂ© microbienne. En effet, de nombreuses espĂšces n’ont pas encore Ă©tĂ© sĂ©quencĂ©es ou sont reprĂ©sentĂ©es par seulement quelques gĂ©nomes.La crĂ©ation de catalogues de gĂšnes non redondants par assemblage de novo suivie du regroupement des gĂšnes co-abondants rĂ©vĂšlent une partie de la matiĂšre noire microbienne en reconstituant le rĂ©pertoire de gĂšnes d’espĂšces potentiellement inconnues. Bien que les mĂ©thodes existantes identifient avec prĂ©cision les gĂšnes core prĂ©sents dans toutes les souches d’une espĂšce, elles omettent de nombreux gĂšnes accessoires ou les divisent en petits groupes de gĂšnes qui ne sont pas associĂ©s aux core gĂ©nomes. Or, capturer ces gĂšnes accessoires est indispensable en recherche clinique et Ă©pidĂ©miologique car ces derniers assurent des fonctions spĂ©cifiques Ă  certaines souches telles que la pathogĂ©nicitĂ© ou la rĂ©sistance aux antibiotiques.Lors de cette thĂšse, nous avons dĂ©veloppĂ© MSPminer, un logiciel performant qui reconstitue et structure des pan-gĂ©nomes d’espĂšces mĂ©tagĂ©nomiques (ou MSPs pour Metagenomic Species Pan-genomes) en regroupant les gĂšnes co-abondants dans un ensemble d’échantillons mĂ©tagĂ©nomiques. MSPminer s’appuie sur une nouvelle mesure robuste de la proportionnalitĂ© couplĂ©e Ă  un classificateur empirique pour regrouper et distinguer les gĂšnes core mais aussi les gĂšnes accessoires des espĂšces microbiennes.GrĂące Ă  MSPminer, nous avons structurĂ© un catalogue de 9,9 millions de gĂšnes du microbiote intestinal humain en 1 661 MSPs. L’homogĂ©nĂ©itĂ© de l’annotation taxonomique, de la composition nuclĂ©otidique ainsi que la prĂ©sence de gĂšnes essentiels indiquent que les MSPs ne correspondent pas Ă  des chimĂšres mais Ă  des objets biologiquement cohĂ©rents regroupant des gĂšnes provenant de la mĂȘme espĂšce. Parmi ces MSPs, 1 301 (78%) n’ont pas pu ĂȘtre annotĂ©es au niveau espĂšce montrant que de nombreux microorganismes colonisant l’intestin humain demeurent inconnus malgrĂ© les progrĂšs substantiels des techniques de culture microbienne. Remarquablement, les MSPs capturent bien plus de gĂšnes que les clusters gĂ©nĂ©rĂ©s par les outils existants tout en garantissant une spĂ©cificitĂ© Ă©levĂ©e.Cet ensemble de MSPs peut d’ores et dĂ©jĂ  ĂȘtre utilisĂ© pour le profilage taxonomique et la dĂ©couverte de biomarqueurs dans des Ă©chantillons de selles humaines. Ainsi, nous tirons parti des MSPs pour comparer l’impact sur le microbiote intestinal des deux principaux types de chirurgie bariatrique, la gastrectomie par laparoscopie (LSG) et la dĂ©rivation gastrique de Roux-en-Y (LRYGB). Enfin, les MSPs ouvrent la voie Ă  des analyses au niveau souche. Dans une autre cohorte, nous avons mis en Ă©vidence l’existence de sous-espĂšces associĂ©es Ă  l’origine gĂ©ographique de l’hĂŽte en Ă©tudiant les profils de prĂ©sence/absence des gĂšnes accessoires groupĂ©s dans les MSPs.The advent of shotgun metagenomic sequencing has revolutionized microbiology by allowing culture-independent characterization of complex microbial communities such as the human gut microbiota. Recently developed bioinformatics tools achieved strain-level resolution by making a census of accessory genes or by capturing nucleotide variants (SNPs). Yet, these tools are hampered by the extent of available reference genomes which are far from covering all the microbial variability. Indeed, many species are still not sequenced or are represented by only few genomes.Building of non-redundant gene catalogs followed by the binning of co-abundant genes reveals a part of the microbial dark matter by reconstituting the gene repertoire of species potentially unknown. While existing methods accurately identify core genes present in all the strains of a species, they miss many accessory genes or split them into small gene groups that remain unassociated to core genomes. However, capturing these accessory genes is essential in clinical research and epidemiology because they provide functions specific to certain strains such as pathogenicity or antibiotic resistance.In this thesis, we developed MSPminer, a computationally efficient software tool that reconstitutes Metagenomic Species Pan-genomes (MSPs) by binning co-abundant genes across metagenomic samples. MSPminer relies on a new robust measure of proportionality coupled with an empirical classifier to group and distinguish not only species core genes but accessory genes also.With MSPminer, we structured a catalog made up of 9.9 million genes of the human gut microbiota in 1 661 MSPs. The homogeneity of the taxonomic annotation, of the nucleotide composition as well as the presence of essential genes indicate that the MSPs do not correspond to chimeras but to biologically consistent objects grouping genes from the same species. Among these MSPs, 1 301 (78%) could not be annotated at species level showing that many microorganisms colonizing the human intestinal tract are still unknown despite the substantial improvements of microbial culture techniques. Remarkably, MSPs capture more genes than clusters generated by existing tools while ensuring high specificity.This set of MSPs can be readily used for taxonomic profiling and biomarkers discovery in human gut metagenomic samples. In this way, we take advantage of the MSPs to compare the impact of two main types of surgeries, the laparoscopic sleeve gastrectomy (LSG) and the Roux-En-Y gastric bypass (LRYGB). Finally, the MSPs open the way to strain-level analyses. In another cohort, we identified subspecies associated the host geographical origin by studying presence/absence patterns of the accessory genes grouped in the MSPs

    Reconstitution de pan-gĂ©nomes microbiens par sĂ©quençage mĂ©tagĂ©nomique alĂ©atoire : Application Ă  l’étude du microbiote intestinal humain

    Get PDF
    The advent of shotgun metagenomic sequencing has revolutionized microbiology by allowing culture-independent characterization of complex microbial communities such as the human gut microbiota. Recently developed bioinformatics tools achieved strain-level resolution by making a census of accessory genes or by capturing nucleotide variants (SNPs). Yet, these tools are hampered by the extent of available reference genomes which are far from covering all the microbial variability. Indeed, many species are still not sequenced or are represented by only few genomes.Building of non-redundant gene catalogs followed by the binning of co-abundant genes reveals a part of the microbial dark matter by reconstituting the gene repertoire of species potentially unknown. While existing methods accurately identify core genes present in all the strains of a species, they miss many accessory genes or split them into small gene groups that remain unassociated to core genomes. However, capturing these accessory genes is essential in clinical research and epidemiology because they provide functions specific to certain strains such as pathogenicity or antibiotic resistance.In this thesis, we developed MSPminer, a computationally efficient software tool that reconstitutes Metagenomic Species Pan-genomes (MSPs) by binning co-abundant genes across metagenomic samples. MSPminer relies on a new robust measure of proportionality coupled with an empirical classifier to group and distinguish not only species core genes but accessory genes also.With MSPminer, we structured a catalog made up of 9.9 million genes of the human gut microbiota in 1 661 MSPs. The homogeneity of the taxonomic annotation, of the nucleotide composition as well as the presence of essential genes indicate that the MSPs do not correspond to chimeras but to biologically consistent objects grouping genes from the same species. Among these MSPs, 1 301 (78%) could not be annotated at species level showing that many microorganisms colonizing the human intestinal tract are still unknown despite the substantial improvements of microbial culture techniques. Remarkably, MSPs capture more genes than clusters generated by existing tools while ensuring high specificity.This set of MSPs can be readily used for taxonomic profiling and biomarkers discovery in human gut metagenomic samples. In this way, we take advantage of the MSPs to compare the impact of two main types of surgeries, the laparoscopic sleeve gastrectomy (LSG) and the Roux-En-Y gastric bypass (LRYGB). Finally, the MSPs open the way to strain-level analyses. In another cohort, we identified subspecies associated the host geographical origin by studying presence/absence patterns of the accessory genes grouped in the MSPs.L’avĂšnement du sĂ©quençage mĂ©tagĂ©nomique alĂ©atoire a rĂ©volutionnĂ© la microbiologie en permettant la caractĂ©risation sans culture prĂ©alable de communautĂ©s microbiennes complexes telles que le microbiote intestinal humain. Des outils bioinformatiques rĂ©cemment dĂ©veloppĂ©s atteignent une rĂ©solution au niveau de la souche en recensant des gĂšnes accessoires ou en capturant des variants nuclĂ©otidiques (SNPs). Toutefois, ces outils sont limitĂ©s par l’étendue des gĂ©nomes de rĂ©fĂ©rence disponibles qui sont loin de couvrir toute la variabilitĂ© microbienne. En effet, de nombreuses espĂšces n’ont pas encore Ă©tĂ© sĂ©quencĂ©es ou sont reprĂ©sentĂ©es par seulement quelques gĂ©nomes.La crĂ©ation de catalogues de gĂšnes non redondants par assemblage de novo suivie du regroupement des gĂšnes co-abondants rĂ©vĂšlent une partie de la matiĂšre noire microbienne en reconstituant le rĂ©pertoire de gĂšnes d’espĂšces potentiellement inconnues. Bien que les mĂ©thodes existantes identifient avec prĂ©cision les gĂšnes core prĂ©sents dans toutes les souches d’une espĂšce, elles omettent de nombreux gĂšnes accessoires ou les divisent en petits groupes de gĂšnes qui ne sont pas associĂ©s aux core gĂ©nomes. Or, capturer ces gĂšnes accessoires est indispensable en recherche clinique et Ă©pidĂ©miologique car ces derniers assurent des fonctions spĂ©cifiques Ă  certaines souches telles que la pathogĂ©nicitĂ© ou la rĂ©sistance aux antibiotiques.Lors de cette thĂšse, nous avons dĂ©veloppĂ© MSPminer, un logiciel performant qui reconstitue et structure des pan-gĂ©nomes d’espĂšces mĂ©tagĂ©nomiques (ou MSPs pour Metagenomic Species Pan-genomes) en regroupant les gĂšnes co-abondants dans un ensemble d’échantillons mĂ©tagĂ©nomiques. MSPminer s’appuie sur une nouvelle mesure robuste de la proportionnalitĂ© couplĂ©e Ă  un classificateur empirique pour regrouper et distinguer les gĂšnes core mais aussi les gĂšnes accessoires des espĂšces microbiennes.GrĂące Ă  MSPminer, nous avons structurĂ© un catalogue de 9,9 millions de gĂšnes du microbiote intestinal humain en 1 661 MSPs. L’homogĂ©nĂ©itĂ© de l’annotation taxonomique, de la composition nuclĂ©otidique ainsi que la prĂ©sence de gĂšnes essentiels indiquent que les MSPs ne correspondent pas Ă  des chimĂšres mais Ă  des objets biologiquement cohĂ©rents regroupant des gĂšnes provenant de la mĂȘme espĂšce. Parmi ces MSPs, 1 301 (78%) n’ont pas pu ĂȘtre annotĂ©es au niveau espĂšce montrant que de nombreux microorganismes colonisant l’intestin humain demeurent inconnus malgrĂ© les progrĂšs substantiels des techniques de culture microbienne. Remarquablement, les MSPs capturent bien plus de gĂšnes que les clusters gĂ©nĂ©rĂ©s par les outils existants tout en garantissant une spĂ©cificitĂ© Ă©levĂ©e.Cet ensemble de MSPs peut d’ores et dĂ©jĂ  ĂȘtre utilisĂ© pour le profilage taxonomique et la dĂ©couverte de biomarqueurs dans des Ă©chantillons de selles humaines. Ainsi, nous tirons parti des MSPs pour comparer l’impact sur le microbiote intestinal des deux principaux types de chirurgie bariatrique, la gastrectomie par laparoscopie (LSG) et la dĂ©rivation gastrique de Roux-en-Y (LRYGB). Enfin, les MSPs ouvrent la voie Ă  des analyses au niveau souche. Dans une autre cohorte, nous avons mis en Ă©vidence l’existence de sous-espĂšces associĂ©es Ă  l’origine gĂ©ographique de l’hĂŽte en Ă©tudiant les profils de prĂ©sence/absence des gĂšnes accessoires groupĂ©s dans les MSPs

    Four functional profiles for fibre and mucin metabolism in the human gut microbiome

    No full text
    International audienceBackground With the emergence of metagenomic data, multiple links between the gut microbiome and the host health have been shown. Deciphering these complex interactions require evolved analysis methods focusing on the microbial ecosystem functions. Despite the fact that host or diet-derived fibres are the most abundant nutrients available in the gut, the presence of distinct functional traits regarding fibre and mucin hydrolysis, fermentation and hydrogenotrophic processes has never been investigated. Results After manually selecting 91 KEGG orthologies and 33 glycoside hydrolases further aggregated in 101 functional descriptors representative of fibre and mucin degradation pathways in the gut microbiome, we used non-negative matrix factorization to mine metagenomic datasets. Four distinct metabolic profiles were further identified on a training set of 1153 samples and thoroughly validated on a large database of 2571 unseen samples from 5 external metagenomic cohorts. Profiles 1 and 2 are the main contributors to the fibre-degradation-related metagenome: they present contrasted involvement in fibre degradation and sugar metabolism and are differentially linked to dysbiosis, metabolic disease and inflammation. Profile 1 takes over Profile 2 inhealthy samples, and unbalance of these profiles characterize dysbiotic samples. Furthermore, high fibre diet favours a healthy balance between Profiles 1 and Profile 2. Profile 3 takes over Profile 2 during Crohn’s disease, inducing functional reorientations towards unusual metabolism such as fucose and H2S degradation or propionate, acetone and butanediol production. Profile 4 gathers under-represented functions, like methanogenesis. Two taxonomic makes up of the profiles were investigated, using either the covariation of 203 prevalent genomes or metagenomic species, both providing consistent results in line with their functional characteristics. This taxonomic characterization showed that Profiles 1 and 2 were respectively mainly composed of bacteria from the phyla Bacteroidetes and Firmicutes while Profile 3 is representative of Proteobacteria and Profile 4 of methanogens.Conclusions Integrating anaerobic microbiology knowledge with statistical learning can narrow down the metagenomic analysis to investigate functional profiles. Applying this approach to fibre degradation in the gut ended with 4 distinct functional profiles that can be easily monitored as markers of diet, dysbiosis, inflammation and disease

    MSPminer abundance-based reconstitution of microbial pan-genomes from shotgun metagenomic data

    No full text
    Motivation Analysis toolkits for shotgun metagenomic data achieve strain-level characterization of complex microbial communities by capturing intra-species gene content variation. Yet, these tools are hampered by the extent of reference genomes that are far from covering all microbial variability, as many species are still not sequenced or have only few strains available. Binning co-abundant genes obtained from de novo assembly is a powerful reference-free technique to discover and reconstitute gene repertoire of microbial species. While current methods accurately identify species core parts, they miss many accessory genes or split them into small gene groups that remain unassociated to core clusters. Results We introduce MSPminer, a computationally efficient software tool that reconstitutes Metagenomic Species Pan-genomes (MSPs) by binning co-abundant genes across metagenomic samples. MSPminer relies on a new robust measure of proportionality coupled with an empirical classifier to group and distinguish not only species core genes but accessory genes also. Applied to a large scale metagenomic dataset, MSPminer successfully delineates in a few hours the gene repertoires of 1661 microbial species with similar specificity and higher sensitivity than existing tools. The taxonomic annotation of MSPs reveals microorganisms hitherto unknown and brings coherence in the nomenclature of the species of the human gut microbiota. The provided MSPs can be readily used for taxonomic profiling and biomarkers discovery in human gut metagenomic samples. In addition, MSPminer can be applied on gene count tables from other ecosystems to perform similar analyses. Availability and implementation The binary is freely available for non-commercial users at www.enterome.com/downloads. Supplementary information Supplementary data are available at Bioinformatics online

    Quality control of microbiota metagenomics by k-mer analysis

    Get PDF
    Background: The biological and clinical consequences of the tight interactions between host and microbiota are rapidly being unraveled by next generation sequencing technologies and sophisticated bioinformatics, also referred to as microbiota metagenomics. The recent success of metagenomics has created a demand to rapidly apply the technology to large case–control cohort studies and to studies of microbiota from various habitats, including habitats relatively poor in microbes. It is therefore of foremost importance to enable a robust and rapid quality assessment of metagenomic data from samples that challenge present technological limits (sample numbers and size). Here we demonstrate that the distribution of overlapping k-mers of metagenome sequence data predicts sequence quality as defined by gene distribution and efficiency of sequence mapping to a reference gene catalogue. [br/][br/] Results: We used serial dilutions of gut microbiota metagenomic datasets to generate well-defined high to low quality metagenomes. We also analyzed a collection of 52 microbiota-derived metagenomes. We demonstrate that k-mer distributions of metagenomic sequence data identify sequence contaminations, such as sequences derived from " empty " ligation products. Of note, k-mer distributions were also able to predict the frequency of sequences mapping to a reference gene catalogue not only for the well-defined serial dilution datasets, but also for 52 human gut microbiota derived metagenomic datasets. [br/][br/] Conclusions: We propose that k-mer analysis of raw metagenome sequence reads should be implemented as a first quality assessment prior to more extensive bioinformatics analysis, such as sequence filtering and gene mapping. With the rising demand for metagenomic analysis of microbiota it is crucial to provide tools for rapid and efficient decision making. This will eventually lead to a faster turnaround time, improved analytical quality including sample quality metrics and a significant cost reduction. Finally, improved quality assessment will have a major impact on the robustness of biological and clinical conclusions drawn from metagenomic studies
    corecore