Search CORE

PubMed Central

Computational quality control tools for mass spectrometry proteomics

Author: Bittremieux Wout
Laukens Kris
Martens Lennart
Valkenborg Dirk
Publication venue
Publication date: 01/01/2017
Field of study

As mass spectrometry-based proteomics has matured during the past decade a growing emphasis has been placed on quality control. For this purpose multiple computational quality control tools have been introduced. These tools generate a set of metrics that can be used to assess the quality of a mass spectrometry experiment. Here we review which different types of quality control metrics can be generated, and how they can be used to monitor both intra- and inter-experiment performance. We discuss the principal computational tools for quality control and list their main characteristics and applicability. As most of these tools have specific use cases it is not straightforward to compare their performance. For this survey we used different sets of quality control metrics derived from information at various stages in a mass spectrometry process and evaluated their effectiveness at capturing qualitative information about an experiment using a supervised learning approach. Furthermore, we discuss currently available algorithmic solutions that enable the usage of these quality control metrics for decision-making. This is the peer reviewed version of the following article: "Bittremieux, W., Valkenborg, D., Martens, L. & Laukens, K. Computational quality control tools for mass spectrometry proteomics. PROTEOMICS 17, 1600159 (2017)", which has been published in final form at https://doi.org/10.1002/pmic.201600159. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Thermodynamic framework to assess low abundance DNA mutation detection by hybridization

Author: Hadiwikarta Wahyu Wijaya
Hooyberghs Jef
Jacobs An
Valkenborg Dirk
Van Roy Nadine
Vandesompele Jo
Venken Tom
Willems Hanny
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2017
Field of study

The knowledge of genomic DNA variations in patient samples has a high and increasing value for human diagnostics in its broadest sense. Although many methods and sensors to detect or quantify these variations are available or under development, the number of underlying physico-chemical detection principles is limited. One of these principles is the hybridization of sample target DNA versus nucleic acid probes. We introduce a novel thermodynamics approach and develop a framework to exploit the specific detection capabilities of nucleic acid hybridization, using generic principles applicable to any platform. As a case study, we detect point mutations in the KRAS oncogene on a microarray platform. For the given platform and hybridization conditions, we demonstrate the multiplex detection capability of hybridization and assess the detection limit using thermodynamic considerations; DNA containing point mutations in a background of wild type sequences can be identified down to at least 1% relative concentration. In order to show the clinical relevance, the detection capabilities are confirmed on challenging formalin-fixed paraffin-embedded clinical tumor samples. This enzyme-free detection framework contains the accuracy and efficiency to screen for hundreds of mutations in a single run with many potential applications in molecular diagnostics and the field of personalised medicine

FigShare

Pattern mining of mass spectrometry quality control data

Author: Bittremieux Wout
Goethals Bart
Laukens Kris
Mrzic Aida
Valkenborg Dirk
Willems Hanny
Publication venue
Publication date
Field of study

Pattern mining of mass spectrometry quality control data Mass spectrometry is widely used to identify proteins based on the mass distribution of their peptides. Unfortunately, because of its inherent complexity, the results of a mass spectrometry experiment can be subject to a large variability. As a means of quality control, recently several qualitative metrics have been defined. [1] Initially these quality control metrics were evaluated independently in order to separately assess particular stages of a mass spectrometry experiment. However, this method is insufficient because the different stages of an experiment do not function in isolation, instead they will influence each other. As a result, subsequent work employed a multivariate statistics approach to assess the correlation structure of the different quality control metrics. [2] However, by making use of some more advanced data mining techniques, additional useful information can be extracted from these quality control metrics. Various pattern mining techniques can be employed to discover hidden patterns in this quality control data. Subspace clustering tries to detect clusters of items based on a restricted set of dimensions. [3] This can be leveraged to for example detect aberrant experiments where only a few of the quality control metrics are outliers, but the experiment still behaved correctly in general. In addition, specialized frequent itemset mining and association rule learning techniques can be used to discover relationships between the various stages of a mass spectrometry experiment, as they are exhibited by the different quality control metrics. Finally, a major source of untapped information lies in the temporal aspect. Most often, problems in a mass spectrometry setup appear gradually, but are only observed after a critical juncture. As previous analyses have not used this temporal information directly, there remains a large potential to detect these problems as soon as they start to manifest by taking this additional dimension of information into account. Based on the previously discovered patterns, these can be evaluated over time by making use of sequential pattern mining techniques. The awareness has risen that suitable quality control information is mandatory to assess the validity of a mass spectrometry experiment. Current efforts aim to standardize this quality control information [4], which will facilitate the dissemination of the data. This results in a large amount of as of yet untapped information, which can be leveraged by making use of specific data mining techniques in order to harness the full power of this new information. [1] Rudnick, P. A. et al. Performance metrics for liquid chromatography-tandem mass spectrometry systems in proteomics analyses. Molecular & Cellular Proteomics 9, 225–241 (2010). [2] Wang, X. et al. QC metrics from CPTAC raw LC-MS/MS data interpreted through multivariate statistics. Analytical Chemistry 86, 2497–2509 (2014). [3] Aksehirli, E., Goethals, B., Müller, E. & Vreeken, J. Cartification: A neighborhood preserving transformation for mining high dimensional data. in Thirteenth IEEE International Conference on Data Mining - ICDM ’13 937–942 (IEEE, 2013). doi:10.1109/ICDM.2013.146 [4] Walzer, M. et al. qcML: An exchange format for quality control metrics from mass spectrometry experiments. Molecular & Cellular Proteomics (2014). doi:10.1074/mcp.M113.03590

FigShare

Machine learning applications in proteomics research: How the past can boost the future

Author: Barsnes Harald
Bittremieux Wout
De Grave Kurt
Degroeve S
Kelchtermans Pieter
Laukens Kris
Martens Lennart
Ramon Jan
Valkenborg Dirk
Publication venue: 'Wiley'
Publication date: 06/09/2017
Field of study

Machine learning is a subdiscipline within artificial intelligence that focuses on algorithms that allow computers to learn solving a (complex) problem from existing data. This ability can be used to generate a solution to a particularly intractable problem, given that enough data are available to train and subsequently evaluate an algorithm on. Since MS-based proteomics has no shortage of complex problems, and since publicly available data are becoming available in ever growing amounts, machine learning is fast becoming a very popular tool in the field. We here therefore present an overview of the different applications of machine learning in proteomics that together cover nearly the entire wet- and dry-lab workflow, and that address key bottlenecks in experiment planning and design, as well as in data processing and analysis.acceptedVersio

University of Bergen

Deciphering the morphology of motor evoked potentials

Author: Becker Thijs
Cambron Melissa
Dive Dominique
Hellings Niels
Laureys Guy
Peeters Liesbet M.
Popescu Veronica
Valkenborg Dirk
Van Wijmeersch Bart
Yperman Jan
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2020
Field of study

Motor Evoked Potentials (MEPs) are used to monitor disability progression in multiple sclerosis (MS). Their morphology plays an important role in this process. Currently, however, there is no clear definition of what constitutes a normal or abnormal morphology. To address this, five experts independently labeled the morphology (normal or abnormal) of the same set of 1,000 MEPs. The intra- and inter-rater agreement between the experts indicates they agree on the concept of morphology, but differ in their choice of threshold between normal and abnormal morphology. We subsequently performed an automated extraction of 5,943 time series features from the MEPs to identify a valid proxy for morphology, based on the provided labels. To do this, we compared the cross-validation performances of one-dimensional logistic regression models fitted to each of the features individually. We find that the approximate entropy (ApEn) feature can accurately reproduce the majority-vote labels. The performance of this feature is evaluated on an independent test set by comparing to the majority vote of the neurologists, obtaining an AUC score of 0.92. The model slightly outperforms the average neurologist at reproducing the neurologists consensus-vote labels. We can conclude that MEP morphology can be consistently defined by pooling the interpretations from multiple neurologists and that ApEn is a valid continuous score for this. Having an objective and reproducible MEP morphological abnormality score will allow researchers to include this feature in their models, without manual annotation becoming a bottleneck. This is crucial for large-scale, multi-center datasets. An exploratory analysis on a large single-center dataset shows that ApEn is potentially clinically useful. Introducing an automated, objective, and reproducible definition of morphology could help overcome some of the barriers that are currently obstructing broad adoption of evoked potentials in daily care and patient follow-up, such as standardization of measurements between different centers, and formulating guidelines for clinical use

MIND: A Double-Linear Model To Accurately Determine Monoisotopic Precursor Mass in High-Resolution Top-Down Proteomics

Author: Baggerman Geert
Claesen J\ufcrgen
Dittwald Piotr
Gambin Anna
Hooyberghs Jef
Laukens Kris
Lennyte Frederik
Lermyte Frederik
O'Connor Peter B.
Sobott Frank
Valkenborg Dirk
Publication venue: 'American Chemical Society (ACS)'
Publication date: 01/01/2019
Field of study

Top-down proteomics approaches are becoming ever more popular, due to the advantages offered by knowledge of the intact protein mass in correctly identifying the various proteoforms that potentially arise due to point mutation, alternative splicing, post-translational modifications, etc. Usually, the average mass is used in this context; however, it is known that this can fluctuate significantly due to both natural and technical causes. Ideally, one would prefer to use the monoisotopic precursor mass, but this falls below the detection limit for all but the smallest proteins. Methods that predict the monoisotopic mass based on the average mass are potentially affected by imprecisions associated with the average mass. To address this issue, we have developed a framework based on simple, linear models that allows prediction of the monoisotopic mass based on the exact mass of the most-abundant (aggregated) isotope peak, which is a robust measure of mass, insensitive to the aforementioned natural and technical causes. This linear model was tested experimentally, as well as in silico, and typically predicts monoisotopic masses with an accuracy of only a few parts per million. A confidence measure is associated with the predicted monoisotopic mass to handle the off-by-one-Da prediction error. Furthermore, we introduce a correction function to extract the “true” (i.e., theoretically) most-abundant isotope peak from a spectrum, even if the observed isotope distribution is distorted by noise or poor ion statistics. The method is available online as an R shiny app: https://valkenborg-lab.shinyapps.io/mind

Document Server@UHasselt (Universiteit Hasselt)

White Rose Research Online

An integrated workflow for robust alignment and simplified quantitative analysis of NMR spectrometry data

Author: Alain Verschoren
Bart Goethals
C Yang
CA Smith
CD Manning
CS Tan
Dirk Valkenborg
DP De Souza
E Alm
F Savorani
FH Larsen
Filip Lemière
G Lee
G Tomasi
J Forshed
JA Hageman
JWH Wong
KA Verwaest
KA Veselkov
Kim A Verwaest
Koen Smets
Kris Laukens
MC Codrea
NV Nielsen
P Du
R Tibshirani
Roger Dommisse
S Dudoit
S Ishii
S Zhang
SA Kazmi
T Skov
Trung N Vu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Nuclear magnetic resonance spectroscopy (NMR) is a powerful technique to reveal and compare quantitative metabolic profiles of biological tissues. However, chemical and physical sample variations make the analysis of the data challenging, and typically require the application of a number of preprocessing steps prior to data interpretation. For example, noise reduction, normalization, baseline correction, peak picking, spectrum alignment and statistical analysis are indispensable components in any NMR analysis pipeline. Results We introduce a novel suite of informatics tools for the quantitative analysis of NMR metabolomic profile data. The core of the processing cascade is a novel peak alignment algorithm, called hierarchical Cluster-based Peak Alignment (CluPA). The algorithm aligns a target spectrum to the reference spectrum in a top-down fashion by building a hierarchical cluster tree from peak lists of reference and target spectra and then dividing the spectra into smaller segments based on the most distant clusters of the tree. To reduce the computational time to estimate the spectral misalignment, the method makes use of Fast Fourier Transformation (FFT) cross-correlation. Since the method returns a high-quality alignment, we can propose a simple methodology to study the variability of the NMR spectra. For each aligned NMR data point the ratio of the between-group and within-group sum of squares (BW-ratio) is calculated to quantify the difference in variability between and within predefined groups of NMR spectra. This differential analysis is related to the calculation of the F-statistic or a one-way ANOVA, but without distributional assumptions. Statistical inference based on the BW-ratio is achieved by bootstrapping the null distribution from the experimental data. Conclusions The workflow performance was evaluated using a previously published dataset. Correlation maps, spectral and grey scale plots show clear improvements in comparison to other methods, and the down-to-earth quantitative analysis works well for the CluPA-aligned spectra. The whole workflow is embedded into a modular and statistically sound framework that is implemented as an R package called "speaq" ("spectrum alignment and quantitation"), which is freely available from <url>http://code.google.com/p/speaq/</url>.</p

Springer - Publisher Connector

PubMed Central

Hal - Université Grenoble Alpes

A community proposal to integrate proteomics activities in ELIXIR

Computational approaches have been major drivers behind the progress of proteomics in recent years. The aim of this white paper is to provide a framework for integrating computational proteomics into ELIXIR in the near future, and thus to broaden the portfolio of omics technologies supported by this European distributed infrastructure. This white paper is the direct result of a strategy meeting on ‘The Future of Proteomics in ELIXIR’ that took place in March 2017 in Tübingen (Germany), and involved representatives of eleven ELIXIR nodes. These discussions led to a list of priority areas in computational proteomics that would complement existing activities and close gaps in the portfolio of tools and services offered by ELIXIR so far. We provide some suggestions on how these activities could be integrated into ELIXIR’s existing platforms, and how it could lead to a new ELIXIR use case in proteomics. We also highlight connections to the related field of metabolomics, where similar activities are ongoing. This white paper could thus serve as a starting point for the integration of computational proteomics into ELIXIR. Over the next few months we will be working closely with all stakeholders involved, and in particular with other representatives of the proteomics community, to further refine this paper

University of Groningen

Proceedings - University of Groningen

Lund University Publications

ARTS repository - University of Groningen

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

University of Southern Denmark Research Output

Utrecht University Repository

Dissertations of the University of Groningen

qcML: an exchange format for quality control metrics from mass spectrometry experiments.

Quality control is increasingly recognized as a crucial aspect of mass spectrometry based proteomics. Several recent papers discuss relevant parameters for quality control and present applications to extract these from the instrumental raw data. What has been missing, however, is a standard data exchange format for reporting these performance metrics. We therefore developed the qcML format, an XML-based standard that follows the design principles of the related mzML, mzIdentML, mzQuantML, and TraML standards from the HUPO-PSI (Proteomics Standards Initiative). In addition to the XML format, we also provide tools for the calculation of a wide range of quality metrics as well as a database format and interconversion tools, so that existing LIMS systems can easily add relational storage of the quality control data to their existing schema. We here describe the qcML specification, along with possible use cases and an illustrative example of the subsequent analysis possibilities. All information about qcML is available at http://code.google.com/p/qcml