21 research outputs found

    « GeOMICS », nouveaux concepts de bioinformatique pour un nouvel outil de diagnostic environnemental basé sur l'alliance de la géochimie et des omiques

    No full text
    Soil represents a complex habitat and a difficult matrix for omic analysis. The microbial communities that live in them are adapted to their ecosystems whose physico-chemical characteristics differ according to the nature of the soil, its location and its depth. Floodplain soils, formed from sediments deposited during each flood, drain the alluvium of the watershed with which contaminants have a strong affinity. These soils are the witnesses of anthropic activities and are archives of past contaminations. The choice of a sampling site where sedimentation is regular allows the dating of successive layers of these archives. The objective of this thesis is to study the microbial communities under anthropic pressure from a temporal point of view by linking the changes in anthropic pressure measured on a site whose history is well known and the structuring of the communities sampled on these archives. Metaproteomics approaches based on massive ultrafast mass spectrometry results allow the identification of organisms present in the sample and provide unique functional data. A particular effort has been made to improve the data interpretation pipeline based on cascading queries leading to an identification rate well above the current standard. The combination of public databases and more sample-specific metagenomic data is a winning strategy. In a second step, a Seine River sediment core, subdivided into 35 dated layers, was exhaustively analyzed by metallomics, metagenomics, metataxonomics, and metaproteomics. Microorganisms identified and quantified by metaproteomics led to correlations with trace metal elements concentrations. The strategies and computer tools for data interpretation on a well-characterized core at the geochemical level open the way to an application in rapid diagnosis of microbial communities in soils.Les sols représentent un habitat complexe et une matrice difficile pour leurs analyses omiques. Les communautés microbiennes qui y vivent se sont adaptées à leurs écosystèmes dont les caractéristiques physico-chimiques diffèrent selon la nature du sol, sa localisation et sa profondeur. Les sols de plaine d'inondation, formés à partir des sédiments déposés lors de chaque crue, drainent les alluvions du bassin versant avec qui les contaminants ont une forte affinité . Ces sols sont les témoins des activités anthropiques formant des archives des contaminations passées. Le choix d'un site de prélèvement où la sédimentation est régulière permet la datation des couches successives de ces archives. Les travaux de thèse ont pour objectif d'étudier les communautés microbiennes sous pressions anthropiques d'un point de vue temporel en reliant les changements de pression anthropique mesurés sur un site dont l'historique est bien connu et la structuration des communautés prélevées sur ces archives. Les approches de métaprotéomique basées sur des résultats massifs de spectrométrie de masse ultra-rapide permettent l'identification des organismes présents dans l'échantillon et d'apporter des données fonctionnelles uniques. Un effort particulier a été porté sur l'amélioration du pipeline d'interprétation des données basé sur des requêtes en cascade aboutissant à un taux d'identification bien supérieur au standard actuel. La combinaison de bases de données publiques et de données métagénomiques plus spécifiques de l'échantillon est une stratégie gagnante. Dans un second temps, une archive sédimentaire de la Seine, subdivisée en 35 couches datées, a été analysée de façon exhaustive par métallomique, métagénomique, métataxonomique, et métaprotéomique. Les micro-organismes identifiés et quantifiés par métaprotéomique ont conduit à des corrélations aux concentrations en éléments traces métalliques. Les stratégies et outils informatiques d'interprétation des données sur une carotte bien caractérisée au niveau géochimique ouvrent la voie à une application en diagnostic rapide des communautés microbiennes dans les sols

    Proposal of a decoy-free FDR approach suitable for metaproteomics.

    No full text
    International audienceAccurate and fast evaluation of the False Discovery Rate (FDR) of spectra-to-peptide sequences inference is a difficult task in Metaproteomics, because the extra-large databases used are often largely incomplete and include numerous non-sample-specific sequences, particularly when using microbiota gene catalogs or generalist databases.The traditional approach relying on combined target-decoy databases doubles search time, decreases sensitivity because of a larger search space, and is often biased because of dataset-database inadequate matching. We propose a target-only FDR estimate based on a mixture-model of four beta distributions. We verified its efficiency on a set of 94 result datasets, including 26 metaproteomics searches, and a specific search with a controlled mismatching metaproteomics database. Based on these extensive results, we found this method to be adequate for FDR estimation at the Peptide-Spectrum Matches level for proteomics, proteogenomics, and metaproteomics searches

    A large-scale quality assessment of taxonomy and assembly databases to select genomes and proteomes suitable for meta-omics

    No full text
    International audienceBackgroundHigh-throughput shotgun metaproteomics approaches on environmental or medical microbiomes are generating huge amounts of tandem mass spectrometry data. These can be interpreted either with a general protein sequence database based on more than one hundred thousand sequenced genomes, or with a more customized database such as those obtained after sequencing of the DNA or mRNA material extracted from the same sample. However, genomes are not of equal quality in terms of purity, assembly and annotation, which can critically affect metaproteomic data interpretation. MethodsWe will draw in this presentation a large-scale picture of key issues detected in a 2018 version of the NCBI nr database, by means of in-house specific pipelines.ResultsMetaproteogenomic strategies have an interesting perspective to assess issues related to the databases used to process both metagenomic and metaproteomic datasets. For example, sample handling or materials used can lead to cross contamination of genome sequences with foreign biological material. Results indicate a mitigation of this problem over the last years. We characterized a second source of problems related with the taxonomy of organisms, leading to taxonomical or functional miss-attributions of metaproteomic results. Finally, we identified assembly or annotation anomalies in genomes, resulting in functional annotation issues. These findings prompt for the filtering of lower quality genomes and proteomes to enhance the quality of metagenomic and metaproteomic pipelines. ConclusionsWe propose several methodologies devoted to the quality assessment of genome assemblies and taxonomy databases to improve the outcome of metaproteomics

    Increasing the power of interpretation for soil metaproteomics data

    No full text
    International audienceBackground: Soil and sediment microorganisms are highly phylogenetically diverse but are currently largely under-represented in public molecular databases. Their functional characterization by means of metaproteomics is usually performed using metagenomic sequences acquired for the same sample. However, such hugely diverse metagenomic datasets are difficult to assemble; in parallel, theoretical proteomes from isolates available in generic databases are of high quality. Both these factors advocate for the use of theoretical proteomes in metaproteomics interpretation pipelines. Here, we examined a number of database construction strategies with a view to increasing the outputs of metaproteomics studies performed on soil samples. Results: The number of peptide-spectrum matches was found to be of comparable magnitude when using public or sample-specific metagenomics-derived databases. However, numbers were significantly increased when a combination of both types of information was used in a two-step cascaded search. Our data also indicate that the functional annotation of the metaproteomics dataset can be maximized by using a combination of both types of databases. Conclusions: A two-step strategy combining sample-specific metagenome database and public databases such as the non-redundant NCBI database and a massive soil gene catalog allows maximizing the metaproteomic interpretation both in terms of ratio of assigned spectra and retrieval of function-derived information

    Estimating relative biomasses of organisms in microbiota using “phylopeptidomics”

    No full text
    International audienceAbstract Background There is an important need for the development of fast and robust methods to quantify the diversity and temporal dynamics of microbial communities in complex environmental samples. Because tandem mass spectrometry allows rapid inspection of protein content, metaproteomics is increasingly used for the phenotypic analysis of microbiota across many fields, including biotechnology, environmental ecology, and medicine. Results Here, we present a new method for identifying the biomass contribution of any given organism based on a signature describing the number of peptide sequences shared with all other organisms, calculated by mathematical modeling and phylogenetic relationships. This so-called “phylopeptidomics” principle allows for the calculation of the relative ratios of peptide-specified taxa by the linear combination of such signatures applied to an experimental metaproteomic dataset. We illustrate its efficiency using artificial mixtures of two closely related pathogens of clinical interest, and with more complex microbiota models. Conclusions This approach paves the way to a new vision of taxonomic changes and accurate label-free quantitative metaproteomics for fine-tuned functional characterization
    corecore