339 research outputs found

    Computational quality control tools for mass spectrometry proteomics

    Get PDF
    As mass spectrometry-based proteomics has matured during the past decade a growing emphasis has been placed on quality control. For this purpose multiple computational quality control tools have been introduced. These tools generate a set of metrics that can be used to assess the quality of a mass spectrometry experiment. Here we review which different types of quality control metrics can be generated, and how they can be used to monitor both intra- and inter-experiment performance. We discuss the principal computational tools for quality control and list their main characteristics and applicability. As most of these tools have specific use cases it is not straightforward to compare their performance. For this survey we used different sets of quality control metrics derived from information at various stages in a mass spectrometry process and evaluated their effectiveness at capturing qualitative information about an experiment using a supervised learning approach. Furthermore, we discuss currently available algorithmic solutions that enable the usage of these quality control metrics for decision-making. This is the peer reviewed version of the following article: "Bittremieux, W., Valkenborg, D., Martens, L. & Laukens, K. Computational quality control tools for mass spectrometry proteomics. PROTEOMICS 17, 1600159 (2017)", which has been published in final form at https://doi.org/10.1002/pmic.201600159. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving

    A manually curated network of the PML nuclear body interactome reveals an important role for PML-NBs in SUMOylation dynamics

    Get PDF
    Promyelocytic Leukaemia Protein nuclear bodies (PML-NBs) are dynamic nuclear protein aggregates. To gain insight in PML-NB function, reductionist and high throughput techniques have been employed to identify PML-NB proteins. Here we present a manually curated network of the PML-NB interactome based on extensive literature review including database information. By compiling 'the PML-ome', we highlighted the presence of interactors in the Small Ubiquitin Like Modifier (SUMO) conjugation pathway. Additionally, we show an enrichment of SUMOylatable proteins in the PML-NBs through an in-house prediction algorithm. Therefore, based on the PML network, we hypothesize that PML-NBs may function as a nuclear SUMOylation hotspot

    Pattern mining of mass spectrometry quality control data

    Get PDF
    Pattern mining of mass spectrometry quality control data Mass spectrometry is widely used to identify proteins based on the mass distribution of their peptides. Unfortunately, because of its inherent complexity, the results of a mass spectrometry experiment can be subject to a large variability. As a means of quality control, recently several qualitative metrics have been defined. [1] Initially these quality control metrics were evaluated independently in order to separately assess particular stages of a mass spectrometry experiment. However, this method is insufficient because the different stages of an experiment do not function in isolation, instead they will influence each other. As a result, subsequent work employed a multivariate statistics approach to assess the correlation structure of the different quality control metrics. [2] However, by making use of some more advanced data mining techniques, additional useful information can be extracted from these quality control metrics. Various pattern mining techniques can be employed to discover hidden patterns in this quality control data. Subspace clustering tries to detect clusters of items based on a restricted set of dimensions. [3] This can be leveraged to for example detect aberrant experiments where only a few of the quality control metrics are outliers, but the experiment still behaved correctly in general. In addition, specialized frequent itemset mining and association rule learning techniques can be used to discover relationships between the various stages of a mass spectrometry experiment, as they are exhibited by the different quality control metrics. Finally, a major source of untapped information lies in the temporal aspect. Most often, problems in a mass spectrometry setup appear gradually, but are only observed after a critical juncture. As previous analyses have not used this temporal information directly, there remains a large potential to detect these problems as soon as they start to manifest by taking this additional dimension of information into account. Based on the previously discovered patterns, these can be evaluated over time by making use of sequential pattern mining techniques. The awareness has risen that suitable quality control information is mandatory to assess the validity of a mass spectrometry experiment. Current efforts aim to standardize this quality control information [4], which will facilitate the dissemination of the data. This results in a large amount of as of yet untapped information, which can be leveraged by making use of specific data mining techniques in order to harness the full power of this new information. [1] Rudnick, P. A. et al. Performance metrics for liquid chromatography-tandem mass spectrometry systems in proteomics analyses. Molecular & Cellular Proteomics 9, 225–241 (2010). [2] Wang, X. et al. QC metrics from CPTAC raw LC-MS/MS data interpreted through multivariate statistics. Analytical Chemistry 86, 2497–2509 (2014). [3] Aksehirli, E., Goethals, B., Müller, E. & Vreeken, J. Cartification: A neighborhood preserving transformation for mining high dimensional data. in Thirteenth IEEE International Conference on Data Mining - ICDM ’13 937–942 (IEEE, 2013). doi:10.1109/ICDM.2013.146 [4] Walzer, M. et al. qcML: An exchange format for quality control metrics from mass spectrometry experiments. Molecular & Cellular Proteomics (2014). doi:10.1074/mcp.M113.03590

    Proteomic Assessment of C57BL/6 Hippocampi after Non-Selective Pharmacological Inhibition of Nitric Oxide Synthase Activity:Implications of Seizure-like Neuronal Hyperexcitability Followed by Tauopathy

    Get PDF
    Nitric oxide (NO) is a small gaseous signaling molecule responsible for maintaining homeostasis in a myriad of tissues and molecular pathways in neurology and the cardiovasculature. In recent years, there has been increasing interest in the potential interaction between arterial stiffness (AS), an independent cardiovascular risk factor, and neurodegenerative syndromes given increasingly epidemiological study reports. For this reason, we previously investigated the mechanistic convergence between AS and neurodegeneration via the progressive non-selective inhibition of all nitric oxide synthase (NOS) isoforms with N(G)-nitro-L-arginine methyl ester (L-NAME) in C57BL/6 mice. Our previous results showed progressively increased AS in vivo and impaired visuospatial learning and memory in L-NAME-treated C57BL/6 mice. In the current study, we sought to further investigate the progressive molecular signatures in hippocampal tissue via LC–MS/MS proteomic analysis. Our data implicate mitochondrial dysfunction due to progressive L-NAME treatment. Two weeks of L-NAME treatment implicates altered G-protein-coupled-receptor signaling in the nerve synapse and associated presence of seizures and altered emotional behavior. Furthermore, molecular signatures implicate the cerebral presence of seizure-related hyperexcitability after short-term (8 weeks) treatment followed by ribosomal dysfunction and tauopathy after long-term (16 weeks) treatment

    COLOMBOS v2.0 : an ever expanding collection of bacterial expression compendia

    Get PDF
    The COLOMBOS database (http://www.colombos.net) features comprehensive organism-specific cross-platform gene expression compendia of several bacterial model organisms and is supported by a fully interactive web portal and an extensive web API. COLOMBOS was originally published in PLoS One, and COLOMBOS v2.0 includes both an update of the expression data, by expanding the previously available compendia and by adding compendia for several new species, and an update of the surrounding functionality, with improved search and visualization options and novel tools for programmatic access to the database. The scope of the database has also been extended to incorporate RNA-seq data in our compendia by a dedicated analysis pipeline. We demonstrate the validity and robustness of this approach by comparing the same RNA samples measured in parallel using both microarrays and RNA-seq. As far as we know, COLOMBOS currently hosts the largest homogenized gene expression compendia available for seven bacterial model organisms

    COLOMBOS v3.0: leveraging gene expression compendia for cross-species analyses

    Get PDF
    open13siCOLOMBOS is a database that integrates publicly available transcriptomics data for several prokaryotic model organisms. Compared to the previous version it has more than doubled in size, both in terms of species and data available. The manually curated condition annotation has been overhauled as well, giving more complete information about samples' experimental conditions and their differences. Functionality-wise cross-species analyses now enable users to analyse expression data for all species simultaneously, and identify candidate genes with evolutionary conserved expression behaviour. All the expression-based query tools have undergone a substantial improvement, overcoming the limit of enforced co-expression data retrieval and instead enabling the return of more complex patterns of expression behaviour. COLOMBOS is freely available through a web application at http://colombos.net/. The complete database is also accessible via REST API or downloadable as tab-delimited text files.openMoretto, Marco; Sonego, Paolo; Dierckxsens, Nicolas; Brilli, Matteo; Bianco, Luca; Ledezma-Tejeida, Daniela; Gama-Castro, Socorro; Galardini, Marco; Romualdi, Chiara; Laukens, Kris; Collado-Vides, Julio; Meysman, Pieter; Engelen, KristofMoretto, Marco; Sonego, Paolo; Dierckxsens, Nicolas; Brilli, Matteo; Bianco, Luca; Ledezma Tejeida, Daniela; Gama Castro, Socorro; Galardini, Marco; Romualdi, Chiara; Laukens, Kris; Collado Vides, Julio; Meysman, Pieter; Engelen, Kristo

    Machine learning applications in proteomics research: How the past can boost the future

    Get PDF
    Machine learning is a subdiscipline within artificial intelligence that focuses on algorithms that allow computers to learn solving a (complex) problem from existing data. This ability can be used to generate a solution to a particularly intractable problem, given that enough data are available to train and subsequently evaluate an algorithm on. Since MS-based proteomics has no shortage of complex problems, and since publicly available data are becoming available in ever growing amounts, machine learning is fast becoming a very popular tool in the field. We here therefore present an overview of the different applications of machine learning in proteomics that together cover nearly the entire wet- and dry-lab workflow, and that address key bottlenecks in experiment planning and design, as well as in data processing and analysis.acceptedVersio

    Mining the human proteome for conserved mechanisms

    Full text link

    The Absence of C-5 DNA Methylation in Leishmania donovani Allows DNA Enrichment from Complex Samples.

    Get PDF
    Cytosine C5 methylation is an important epigenetic control mechanism in a wide array of eukaryotic organisms and generally carried out by proteins of the C-5 DNA methyltransferase family (DNMTs). In several protozoans, the status of this mechanism remains elusive, such as in Leishmania, the causative agent of the disease leishmaniasis in humans and a wide array of vertebrate animals. In this work, we showed that the Leishmania donovani genome contains a C-5 DNA methyltransferase (DNMT) from the DNMT6 subfamily, whose function is still unclear, and verified its expression at the RNA level. We created viable overexpressor and knock-out lines of this enzyme and characterized their genome-wide methylation patterns using whole-genome bisulfite sequencing, together with promastigote and amastigote control lines. Interestingly, despite the DNMT6 presence, we found that methylation levels were equal to or lower than 0.0003% at CpG sites, 0.0005% at CHG sites, and 0.0126% at CHH sites at the genomic scale. As none of the methylated sites were retained after manual verification, we conclude that there is no evidence for DNA methylation in this species. We demonstrated that this difference in DNA methylation between the parasite (no detectable DNA methylation) and the vertebrate host (DNA methylation) allowed enrichment of parasite vs. host DNA using methyl-CpG-binding domain columns, readily available in commercial kits. As such, we depleted methylated DNA from mixes of Leishmania promastigote and amastigote DNA with human DNA, resulting in average Leishmania:human enrichments from 62Ă— up to 263Ă—. These results open a promising avenue for unmethylated DNA enrichment as a pre-enrichment step before sequencing Leishmania clinical samples

    MIND: A Double-Linear Model To Accurately Determine Monoisotopic Precursor Mass in High-Resolution Top-Down Proteomics

    Get PDF
    Top-down proteomics approaches are becoming ever more popular, due to the advantages offered by knowledge of the intact protein mass in correctly identifying the various proteoforms that potentially arise due to point mutation, alternative splicing, post-translational modifications, etc. Usually, the average mass is used in this context; however, it is known that this can fluctuate significantly due to both natural and technical causes. Ideally, one would prefer to use the monoisotopic precursor mass, but this falls below the detection limit for all but the smallest proteins. Methods that predict the monoisotopic mass based on the average mass are potentially affected by imprecisions associated with the average mass. To address this issue, we have developed a framework based on simple, linear models that allows prediction of the monoisotopic mass based on the exact mass of the most-abundant (aggregated) isotope peak, which is a robust measure of mass, insensitive to the aforementioned natural and technical causes. This linear model was tested experimentally, as well as in silico, and typically predicts monoisotopic masses with an accuracy of only a few parts per million. A confidence measure is associated with the predicted monoisotopic mass to handle the off-by-one-Da prediction error. Furthermore, we introduce a correction function to extract the “true” (i.e., theoretically) most-abundant isotope peak from a spectrum, even if the observed isotope distribution is distorted by noise or poor ion statistics. The method is available online as an R shiny app: https://valkenborg-lab.shinyapps.io/mind
    • …
    corecore