30 research outputs found

    Développement de nouveaux outils bioinformatiques pour l'exploitation des données de spectrométrie de masse en protéomique haut-débit

    Get PDF
    En biologie, la spectrométrie de masse est devenue l'outil incontournable pour l'identification des protéines. Associée à des techniques de séparation, elle est aussi utilisée pour mesurer la variation d'abondance des protéines entre plusieurs échantillons. Cependant, la très grande quantité et complexité des données liées à ce type d'analyse requièrent des programmes informatiques sophistiqués et adaptés. Mon travail de doctorat a consisté à répondre aux différentes problématiques liées à l'exploitation des données nanoLC-MS/MS, à savoir la validation des résultats d'identification ainsi que la quantification relative des protéines pour des approches mettant en œuvre ou non un marquage isotopique. Le logiciel MFPaQ, dont deux versions sont présentées dans ce document, en est le principal résultat. La version 3 intègre des fonctionnalités telle que la validation des données Mascot, la génération de listes non-redondantes de protéines et la quantification d'analyses ICAT. La version 4, évolution majeure du logiciel, incorpore des algorithmes adaptés à l'analyse quantitative de données MS sans marquage, ainsi que la gestion des stratégies de marquage SILAC et 14N/15N. Son utilisation a facilité la réalisation d'études protéomiques, dont certaines, auxquelles j'ai plus particulièrement participé, sont présentées. Afin de répondre aux futurs enjeux informatiques de la protéomique, j'ai entrepris dans un second temps le développement du logiciel Prosper, qui dispose d'une architecture d'organisation des données permettant de réaliser des requêtes croisées sur l'ensemble des échantillons analysés. Il constitue aussi un outil prototype pour l'élaboration de nouveaux algorithmes.In biology, mass spectrometry has become an indispensable tool for protein identification. Associated with separation techniques, it can also be used to measure the variation of protein abundance between different samples. However, due to the huge quantity and complexity of the data produced by this kind of analysis, sophisticated and suitable computer programs are needed. My PhD work was to address the different issues related to the processing of nanoLC-MS/MS data, namely the validation of the identification results, and the relative quantification of proteins using approaches based or not on isotopic labeling. The MFPaQ program, two versions of which are presented here, is the main result of this work. Version 3 includes features such as Mascot data validation, generation of non-redundant protein lists and quantification of ICAT analyses. Version 4, which represents a major upgrade of the software, incorporates additional algorithms for quantitative analysis of label-free MS data, as well as for the handling of the 14N/15N and SILAC labeling strategies. This bioinformatic tool has been used for various proteomic studies, some of which are discussed here. In order to meet future IT challenges in proteomics, I undertook later the development of the Prosper software, which is based on an optimized architecture for organizing data, and allows performing cross-queries on all analysed samples. It also constitutes a prototype tool for the development and evaluation of new algorithms

    Proceedings of the EuBIC Winter School 2019

    Get PDF
    The 2019 European Bioinformatics Community (EuBIC) Winter School was held from January 15th to January 18th 2019 in Zakopane, Poland. This year’s meeting was the third of its kind and gathered international researchers in the field of (computational) proteomics to discuss (mainly) challenges in proteomics quantification and data independent acquisition (DIA). Here, we present an overview of the scientific program of the 2019 EuBIC Winter School. Furthermore, we can already give a small outlook to the upcoming EuBIC 2020 Developer’s Meeting

    Retention Time and Fragmentation Predictors Increase Confidence in Identification of Common Variant Peptides

    Get PDF
    Precision medicine focuses on adapting care to the individual profile of patients, for example, accounting for their unique genetic makeup. Being able to account for the effect of genetic variation on the proteome holds great promise toward this goal. However, identifying the protein products of genetic variation using mass spectrometry has proven very challenging. Here we show that the identification of variant peptides can be improved by the integration of retention time and fragmentation predictors into a unified proteogenomic pipeline. By combining these intrinsic peptide characteristics using the search-engine post-processor Percolator, we demonstrate improved discrimination power between correct and incorrect peptide-spectrum matches. Our results demonstrate that the drop in performance that is induced when expanding a protein sequence database can be compensated, hence enabling efficient identification of genetic variation products in proteomics data. We anticipate that this enhancement of proteogenomic pipelines can provide a more refined picture of the unique proteome of patients and thereby contribute to improving patient care.publishedVersio

    A community proposal to integrate proteomics activities in ELIXIR

    Get PDF
    Computational approaches have been major drivers behind the progress of proteomics in recent years. The aim of this white paper is to provide a framework for integrating computational proteomics into ELIXIR in the near future, and thus to broaden the portfolio of omics technologies supported by this European distributed infrastructure. This white paper is the direct result of a strategy meeting on ‘The Future of Proteomics in ELIXIR’ that took place in March 2017 in Tübingen (Germany), and involved representatives of eleven ELIXIR nodes.   These discussions led to a list of priority areas in computational proteomics that would complement existing activities and close gaps in the portfolio of tools and services offered by ELIXIR so far. We provide some suggestions on how these activities could be integrated into ELIXIR’s existing platforms, and how it could lead to a new ELIXIR use case in proteomics. We also highlight connections to the related field of metabolomics, where similar activities are ongoing. This white paper could thus serve as a starting point for the integration of computational proteomics into ELIXIR. Over the next few months we will be working closely with all stakeholders involved, and in particular with other representatives of the proteomics community, to further refine this paper

    A proteomics sample metadata representation for multiomics integration and big data analysis

    Get PDF
    The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.publishedVersio

    Potential Plasticity of the Mannoprotein Repertoire Associated to Mycobacterium tuberculosis Virulence Unveiled by Mass Spectrometry-Based Glycoproteomics

    No full text
    To date, Mycobacterium tuberculosis (Mtb) remains the world’s greatest infectious killer. The rise of multidrug-resistant strains stresses the need to identify new therapeutic targets to fight the epidemic. We previously demonstrated that bacterial protein-O-mannosylation is crucial for Mtb infectiousness, renewing the interest of the bacterial-secreted mannoproteins as potential drug-targetable virulence factors. The difficulty of inventorying the mannoprotein repertoire expressed by Mtb led us to design a stringent multi-step workflow for the reliable identification of glycosylated peptides by large-scale mass spectrometry-based proteomics. Applied to the differential analyses of glycoproteins secreted by the wild-type Mtb strain—and by its derived mutant invalidated for the protein-O-mannosylating enzyme PMTub—this approach led to the identification of not only most already known mannoproteins, but also of yet-unknown mannosylated proteins. In addition, analysis of the glycoproteome expressed by the isogenic recombinant Mtb strain overexpressing the PMTub gene revealed an unexpected mannosylation of proteins, with predicted or demonstrated functions in Mtb growth and interaction with the host cell. Since in parallel, a transient increased expression of the PMTub gene has been observed in the wild-type bacilli when infecting macrophages, our results strongly suggest that the Mtb mannoproteome may undergo adaptive regulation during infection of the host cells. Overall, our results provide deeper insights into the complexity of the repertoire of mannosylated proteins expressed by Mtb, and open the way to novel opportunities to search for still-unexploited potential therapeutic targets

    Comparison of label-free quantification methods for the determination of protein complexes subunits stoichiometry

    Get PDF
    Protein complexes are the main molecular machines that support all major cellular pathways and their in-depth characterization are essential to understand their functions. Determining the stoichiometry of the different subunits of a protein complex still remains challenging. Recently, many label-free quantitative proteomic approaches have been developed to study the composition of protein complexes. It is therefore of great interest to evaluate these different methods in a stoichiometry oriented objective. Here we compare the ability of four absolute quantitative label-free methods currently used in proteomic studies to determine the stoichiometry of a well-characterized protein complex, the 26S proteasome
    corecore