18 research outputs found

    De <i>novo </i>transcriptomes of 14 gammarid individuals for proteogenomic analysis of seven taxonomic groups

    Get PDF
    Gammarids are amphipods found worldwide distributed in fresh and marine waters. They play an important role in aquatic ecosystems and are well established sentinel species in ecotoxicology. In this study, we sequenced the transcriptomes of a male individual and a female individual for seven different taxonomic groups belonging to the two genera Gammarus and Echinogammarus: Gammarus fossarum A, G. fossarum B, G. fossarum C, Gammarus wautieri, Gammarus pulex, Echinogammarus berilloni, and Echinogammarus marinus. These taxa were chosen to explore the molecular diversity of transcribed genes of genotyped individuals from these groups. Transcriptomes were de novo assembled and annotated. High-quality assembly was confirmed by BUSCO comparison against the Arthropod dataset. The 14 RNA-Seq-derived protein sequence databases proposed here will be a significant resource for proteogenomics studies of these ecotoxicologically relevant non-model organisms. These transcriptomes represent reliable reference sequences for whole-transcriptome and proteome studies on other gammarids, for primer design to clone specific genes or monitor their specific expression, and for analyses of molecular differences between gammarid species

    Branched-chain amino acid database integrated in MEDIPAD software as a tool for nutritional investigation of mediterranean populations

    Get PDF
    Branched-chained amino acids (BCAA) are essential dietary components for humans and can act as potential biomarkers for diabetes development. To efficiently estimate dietary intake, we developed a BCAA database for 1331 food items found in the French Centre d'Information sur la Qualité des Aliments (CIQUAL) food table by compiling BCAA content from international tables, published measurements, or by food similarity as well as by calculating 267 items from Greek, Turkish, Romanian, and Moroccan mixed dishes. The database embedded in MEDIPAD software capable of registering 24 h of dietary recalls (24HDR) with clinical and genetic data was evaluated based on archived 24HDR of the Saint Pierre Institute (France) from 2957 subjects, which indicated a BCAA content up to 4.2 g/100 g of food and differences among normal weight and obese subjects across BCAA quartiles. We also evaluated the database of 119 interviews of Romanians, Turkish and Albanians in Greece (27⁻65 years) during the MEDIGENE program, which indicated mean BCAA intake of 13.84 and 12.91 g/day in males and females, respectively, comparable to other studies. The MEDIPAD is user-friendly, multilingual, and secure software and with the BCAA database is suitable for conducting nutritional assessment in the Mediterranean area with particular facilities for food administration

    Bioinformatique pour l’exploration de la diversitĂ© inter-espĂšces et inter-populations : hĂ©tĂ©rogĂ©nĂ©itĂ© & donnĂ©es multi-omiques

    No full text
    The exploitation of omics data combining transcriptomic and proteomic enables the detailed study of the molecular mechanisms of non-model organisms exposed to an environmental stress. The assembly of data from the RNA-seq of non-model organism enables to produce the protein database for the interpretation of spectra generated in shotgun proteomics. In this context, the aim of the PhD work was to optimize the interpretation and analysis of proteomic data through the development of innovative concepts for the construction of protein databases and the exploration of biodiversity. The first step focused on the development of a pretreatment method for RNA-seq data based on proteomic attribution results. The second step was to work on reducing the size of the databases by optimizing the parameters of the automated coding region search. The optimized method enabled the analysis of 7 taxonomic groups of Gammarids representative of the diversity found in natura. The proteomic databases thus produced enabled the inter-population analysis of 40 individual Gammarus pulex proteomes from two sampling sites (polluted vs reference). Statistical analysis based on an "individual" approach has shown an heterogeneity of the biological response within a population of organisms induced by an environmental stress. Different subclusters of molecular mechanisms response have been identified. Finally, the study of the transversality of the biomarkers peptides identified with Gammarus fossarum revealed which are the common ones using both proteomic and transcriptomic data. For this purpose, a software for the exploration of peptide sequences has been developed suggesting potential substitute biomarkers when the defined peptides are not available for some species of gammarids. All these concepts aim to improve the interpretation of data by proteogenomics. This work opens the door to the multi-omic analysis of individuals collected in natura by considering inter-species and intra-population biodiversity.L’exploitation conjointe des donnĂ©es transcriptomiques et protĂ©omiques permet l’étude dĂ©taillĂ©e des mĂ©canismes molĂ©culaires induits lors de perturbations environnementales. L’assemblage de donnĂ©es issues du sĂ©quençage des ARNs d’organismes dit « non-modĂšle » permet de produire la base de donnĂ©es pour l’interprĂ©tation des spectres gĂ©nĂ©rĂ©s en protĂ©omique shotgun. Dans ce contexte, les travaux de thĂšse avaient pour objectif d’optimiser l’interprĂ©tation et l’analyse des donnĂ©es protĂ©omiques par le dĂ©veloppement de concepts innovants pour la construction de bases de donnĂ©es protĂ©iques et l’exploration de la biodiversitĂ©. La premiĂšre Ă©tape s’est concentrĂ©e sur la mise au point d’une mĂ©thode de prĂ©-traitement des donnĂ©es de sĂ©quençage basĂ©e sur les rĂ©sultats d’attribution protĂ©omique. La deuxiĂšme Ă©tape a consistĂ© Ă  travailler sur la rĂ©duction de la taille des bases de donnĂ©es en optimisant les paramĂštres de la recherche automatisĂ©e des rĂ©gions codantes. La mĂ©thode optimisĂ©e a permis l’analyse de 7 groupes taxonomiques de GammaridĂ©s reprĂ©sentatifs de la diversitĂ© retrouvĂ©e in natura. Les bases de donnĂ©es protĂ©omiques ainsi produites ont permis l’analyse inter-population de 40 protĂ©omes individuels de Gammarus pulex rĂ©partis sur deux sites de prĂ©lĂšvement (polluĂ© vs rĂ©fĂ©rence). L’analyse statistique basĂ©e sur une approche « individu-centrĂ© » a montrĂ© une hĂ©tĂ©rogĂ©nĂ©itĂ© de la rĂ©ponse biologique au sein d’une population d’organismes suite Ă  une perturbation environnementale. DiffĂ©rents sous-groupes de mĂ©canismes molĂ©culaires induits ont Ă©tĂ© identifiĂ©s. Enfin, l’étude de la transversalitĂ© de biomarqueurs peptidiques identifiĂ©s chez Gammarus fossarum a permis de dĂ©finir les peptides communs Ă  l’aide de l’ensemble des donnĂ©es protĂ©omiques et transcriptomiques. Pour cela, un logiciel d’exploration des sĂ©quences peptidiques a Ă©tĂ© dĂ©veloppĂ© permettant de proposer de potentiels biomarqueurs substituts dans le cas oĂč les peptides dĂ©finis ne sont pas applicables Ă  certaines espĂšces de gammare. Tous ces concepts s’intĂšgrent dans une dĂ©marche pour amĂ©liorer et approfondir l’interprĂ©tation des donnĂ©es par protĂ©ogĂ©nomique. Ces travaux entrouvrent la porte Ă  l’analyse multi-omique d’individus prĂ©levĂ©s in natura en considĂ©rant la biodiversitĂ© inter-espĂšce et intra-population

    Bioinformatics for exploring inter-species and inter-population diversity : heterogenity & multi-omics data

    No full text
    L’exploitation conjointe des donnĂ©es transcriptomiques et protĂ©omiques permet l’étude dĂ©taillĂ©e des mĂ©canismes molĂ©culaires induits lors de perturbations environnementales. L’assemblage de donnĂ©es issues du sĂ©quençage des ARNs d’organismes dit « non-modĂšle » permet de produire la base de donnĂ©es pour l’interprĂ©tation des spectres gĂ©nĂ©rĂ©s en protĂ©omique shotgun. Dans ce contexte, les travaux de thĂšse avaient pour objectif d’optimiser l’interprĂ©tation et l’analyse des donnĂ©es protĂ©omiques par le dĂ©veloppement de concepts innovants pour la construction de bases de donnĂ©es protĂ©iques et l’exploration de la biodiversitĂ©. La premiĂšre Ă©tape s’est concentrĂ©e sur la mise au point d’une mĂ©thode de prĂ©-traitement des donnĂ©es de sĂ©quençage basĂ©e sur les rĂ©sultats d’attribution protĂ©omique. La deuxiĂšme Ă©tape a consistĂ© Ă  travailler sur la rĂ©duction de la taille des bases de donnĂ©es en optimisant les paramĂštres de la recherche automatisĂ©e des rĂ©gions codantes. La mĂ©thode optimisĂ©e a permis l’analyse de 7 groupes taxonomiques de GammaridĂ©s reprĂ©sentatifs de la diversitĂ© retrouvĂ©e in natura. Les bases de donnĂ©es protĂ©omiques ainsi produites ont permis l’analyse inter-population de 40 protĂ©omes individuels de Gammarus pulex rĂ©partis sur deux sites de prĂ©lĂšvement (polluĂ© vs rĂ©fĂ©rence). L’analyse statistique basĂ©e sur une approche « individu-centrĂ© » a montrĂ© une hĂ©tĂ©rogĂ©nĂ©itĂ© de la rĂ©ponse biologique au sein d’une population d’organismes suite Ă  une perturbation environnementale. DiffĂ©rents sous-groupes de mĂ©canismes molĂ©culaires induits ont Ă©tĂ© identifiĂ©s. Enfin, l’étude de la transversalitĂ© de biomarqueurs peptidiques identifiĂ©s chez Gammarus fossarum a permis de dĂ©finir les peptides communs Ă  l’aide de l’ensemble des donnĂ©es protĂ©omiques et transcriptomiques. Pour cela, un logiciel d’exploration des sĂ©quences peptidiques a Ă©tĂ© dĂ©veloppĂ© permettant de proposer de potentiels biomarqueurs substituts dans le cas oĂč les peptides dĂ©finis ne sont pas applicables Ă  certaines espĂšces de gammare. Tous ces concepts s’intĂšgrent dans une dĂ©marche pour amĂ©liorer et approfondir l’interprĂ©tation des donnĂ©es par protĂ©ogĂ©nomique. Ces travaux entrouvrent la porte Ă  l’analyse multi-omique d’individus prĂ©levĂ©s in natura en considĂ©rant la biodiversitĂ© inter-espĂšce et intra-population.The exploitation of omics data combining transcriptomic and proteomic enables the detailed study of the molecular mechanisms of non-model organisms exposed to an environmental stress. The assembly of data from the RNA-seq of non-model organism enables to produce the protein database for the interpretation of spectra generated in shotgun proteomics. In this context, the aim of the PhD work was to optimize the interpretation and analysis of proteomic data through the development of innovative concepts for the construction of protein databases and the exploration of biodiversity. The first step focused on the development of a pretreatment method for RNA-seq data based on proteomic attribution results. The second step was to work on reducing the size of the databases by optimizing the parameters of the automated coding region search. The optimized method enabled the analysis of 7 taxonomic groups of Gammarids representative of the diversity found in natura. The proteomic databases thus produced enabled the inter-population analysis of 40 individual Gammarus pulex proteomes from two sampling sites (polluted vs reference). Statistical analysis based on an "individual" approach has shown an heterogeneity of the biological response within a population of organisms induced by an environmental stress. Different subclusters of molecular mechanisms response have been identified. Finally, the study of the transversality of the biomarkers peptides identified with Gammarus fossarum revealed which are the common ones using both proteomic and transcriptomic data. For this purpose, a software for the exploration of peptide sequences has been developed suggesting potential substitute biomarkers when the defined peptides are not available for some species of gammarids. All these concepts aim to improve the interpretation of data by proteogenomics. This work opens the door to the multi-omic analysis of individuals collected in natura by considering inter-species and intra-population biodiversity

    Discriminating sub-population responses of a mixture of human cell lines by proteogenomics

    No full text
    International audienceMonitoring proteome dynamics from different human cell types present concomitantly in a given sample is of great interest and could be applied to ultra-precise molecular characterization of complex tissues. Here, we propose a proteogenomics-based strategy to point at cell line molecular signatures. For this, the proteome is analyzed by high-throughput shotgun mass spectrometry and specific bioinformatics search are performed. First, mRNA from chondrosarcoma cells (SW1353 cell line) and immortalized chondrocytes (T/C28A2 cell line) were sequenced by RNAseq for establishing the most appropriate protein sequence database. For this an innovative cascade search allows to conciliate de novo and mapping RNAseq assemblies and the Human swissprot databases (Cogne et al., 2018). A set of 2 million of discriminating peptide sequences of the two cell lines are then identified. From them, 480 peptide sequences were detected and monitored based on extracted ion chromatogram (XIC) signals recorded by tandem mass spectrometry. A list of 55 peptides were used for quantitating the ratio of each cell type in a given co-culture sample with high precision selected with cell lines mixed at 2:1, 1:1; and 1:2 ratio. This new methodology was used to analyze the bystander effect generated by irradiated chondrosarcoma cells (SW1353 cell line) on immortalized chondrocytes (T/C28A2 cell line) in co-culture conditions. Such strategy could be applied to investigate intercellular interactions between different cell types, paving the way to new insights into the molecular mechanisms of crosstalk between human cells

    Molecular omics resources and tools for amphipod investigation

    No full text
    International audienceObjectives. Gammarids are key animal sentinels for in situ ecotoxicological biomonitoring of fresh water. Molecular biomarkers representative of key physiological parameters may be defined for gaining insights into the response of organisms to toxicants and measuring the anthropogenic impact in the environment. Recently, proteogenomics, a novel approach intimately combining next-generation sequencing and proteomic methodologies, has emerged as a straightforward strategy for discovering relevant proteins in non-model organisms. This opens the possibility to analyze the molecular players from any amphipod, and even to investigate its microbiota and parasites. Methods. We sequenced the transcriptomes of a male and a female for seven different taxonomical groups: Gammarus fossarum A, G. fossarum B, G. fossarum C, Gammarus wautieri, Gammarus pulex, Echinogammarus berilloni and Echinogammarus marinus. These taxa were chosen to explore the molecular diversity of transcribed genes of genotyped individuals from these groups. Transcriptomes were de novo assembled and annotated. We optimized the de novo assembly strategy and constructed an impressive collection of protein sequences for these fourteen gammarids that can be used for interpreting proteomics data. In parallel, we recorded shotgun proteomics data on more than a hundred gammarid individuals to explore several key questions. We also developed several pipelines to investigate their proteogenomes and their microbiota.Results. For example, we analysed two regional Gammarus pulex populations to characterize the potential proteome divergence induced in one site by natural bioavailable Cadmium contamination compared to a non-contaminated site. We have shown that the intra-population proteome variability of long-term exposed G. pulex was inflated relatively to the non-contaminated population. While remaining a challenge for such organisms with not yet sequenced genomes, taking into account intra-population variability is important to better define the molecular players induced by toxic stress in a comparative field proteomics approach.Conclusion. The fourteen RNA-seq derived protein sequence databases proposed here are an important resource for proteogenomics on these non-model organisms. This work illustrates the relevance of omics for development of multiplexed biomarkers. Interestingly, the tools and strategies developed in this project are transposable to any amphipod

    Molecular omics resources and tools for amphipod investigation

    No full text
    International audienceObjectives. Gammarids are key animal sentinels for in situ ecotoxicological biomonitoring of fresh water. Molecular biomarkers representative of key physiological parameters may be defined for gaining insights into the response of organisms to toxicants and measuring the anthropogenic impact in the environment. Recently, proteogenomics, a novel approach intimately combining next-generation sequencing and proteomic methodologies, has emerged as a straightforward strategy for discovering relevant proteins in non-model organisms. This opens the possibility to analyze the molecular players from any amphipod, and even to investigate its microbiota and parasites. Methods. We sequenced the transcriptomes of a male and a female for seven different taxonomical groups: Gammarus fossarum A, G. fossarum B, G. fossarum C, Gammarus wautieri, Gammarus pulex, Echinogammarus berilloni and Echinogammarus marinus. These taxa were chosen to explore the molecular diversity of transcribed genes of genotyped individuals from these groups. Transcriptomes were de novo assembled and annotated. We optimized the de novo assembly strategy and constructed an impressive collection of protein sequences for these fourteen gammarids that can be used for interpreting proteomics data. In parallel, we recorded shotgun proteomics data on more than a hundred gammarid individuals to explore several key questions. We also developed several pipelines to investigate their proteogenomes and their microbiota.Results. For example, we analysed two regional Gammarus pulex populations to characterize the potential proteome divergence induced in one site by natural bioavailable Cadmium contamination compared to a non-contaminated site. We have shown that the intra-population proteome variability of long-term exposed G. pulex was inflated relatively to the non-contaminated population. While remaining a challenge for such organisms with not yet sequenced genomes, taking into account intra-population variability is important to better define the molecular players induced by toxic stress in a comparative field proteomics approach.Conclusion. The fourteen RNA-seq derived protein sequence databases proposed here are an important resource for proteogenomics on these non-model organisms. This work illustrates the relevance of omics for development of multiplexed biomarkers. Interestingly, the tools and strategies developed in this project are transposable to any amphipod

    From proteogenomics to systems biology in the freshwater amphipod G. fossarum

    No full text
    International audienceObjectives. Next generation sequencing and mass spectrometry technologies have recently expanded the availability of whole transcriptomes and proteomes beyond classical model organisms in molecular biology, even in absence of an annotated genome. These advancements are paving the way to explore the molecular physiology and the mechanisms of toxicity in environmentally relevant species, such as the amphipod G. fossarum. Here we present the results obtained combining systems biology and proteogenomics approaches to get functional insights in the reproductive system of G. fossarum and to identify different mechanistic effects of toxicants in the amphipod’s testes.Methods. We performed co-expression network analyses (based on the R package Weighted Gene Co-expression Network Analysis, WGCNA) on shotgun proteomics data obtained from i) male and female gonads at different maturations stages and, ii) from testes of males exposed during their spermatogenesis at different concentrations of cadmium (Cd), methoxyfenozide (MET) or pyriproxyfen (PYR). Results. We identified groups (modules) of co-expressed proteins (i.e. whose expression is highly correlated) significantly associated with biological processes, such as the secondary vitellogenesis in the female embryos, and spermatozoa morphogenesis and energy metabolism in the testes. In particular, we identified taxonomically restricted proteins that may play a central role in the oocyte maturation. Moreover, we found that different modules of testicular proteins were significantly correlated with the different contaminants we studied. These results show the interest of systems biology approaches for identifying distinct mode of action even in the presence of similar toxicological responses.Conclusion. We showed that co-expression network analysis are a powerful integrative tool to investigate the –omics data issued from a proteogenomic approach in G. fossarum and that provide functional information about many unknown proteins and mode of action of environmental contaminant

    Proteogenomics‐Guided Evaluation of RNA‐Seq Assembly and Protein Database Construction for Emergent Model Organisms

    No full text
    Proteogenomics is gaining momentum as, today, genomics, transcriptomics, and proteomics can be readily performed on any new species. This approach allows key alterations to molecular pathways to be identified when comparing conditions. For animals and plants, RNA‐seq‐informed proteomics is the most popular means of interpreting tandem mass spectrometry spectra acquired for species for which the genome has not yet been sequenced. It relies on high‐performance de novo RNA‐seq assembly and optimized translation strategies. Here, several pre‐treatments for Illumina RNA‐seq reads before assembly are explored to translate the resulting contigs into useful polypeptide sequences. Experimental transcriptomics and proteomics datasets acquired for individual Gammarus fossarum freshwater crustaceans are used, the most relevant procedure is defined by the ratio of MS/MS spectra assigned to peptide sequences. Removing reads with a mean quality score of less than 17–which represents a single probable nucleotide error on 150‐bp reads–prior to assembly, increases the proteomics outcome. The best translation using Transdecoder is achieved with a minimal open reading frame length of 50 amino acids and systematic selection of ORFs longer than 900 nucleotides. Using these parameters, transcriptome assembly and translation informed by proteomics pave the way to further improvements in proteogenomics
    corecore