Proteomic matrix-assisted laser desorption/ionisation (MALDI) linear
time-of-flight (TOF) mass spectrometry (MS) may be used to produce protein
profiles from biological samples with the aim of discovering biomarkers for
disease. However, the raw protein profiles suffer from several sources of bias
or systematic variation which need to be removed via pre-processing before
meaningful downstream analysis of the data can be undertaken. Baseline
subtraction, an early pre-processing step that removes the non-peptide signal
from the spectra, is complicated by the following: (i) each spectrum has, on
average, wider peaks for peptides with higher mass-to-charge ratios (m/z), and
(ii) the time-consuming and error-prone trial-and-error process for optimising
the baseline subtraction input arguments. With reference to the aforementioned
complications, we present an automated pipeline that includes (i) a novel
`continuous' line segment algorithm that efficiently operates over data with a
transformed m/z-axis to remove the relationship between peptide mass and peak
width, and (ii) an input-free algorithm to estimate peak widths on the
transformed m/z scale. The automated baseline subtraction method was deployed
on six publicly available proteomic MS datasets using six different m/z-axis
transformations. Optimality of the automated baseline subtraction pipeline was
assessed quantitatively using the mean absolute scaled error (MASE) when
compared to a gold-standard baseline subtracted signal. Near-optimal baseline
subtraction was achieved using the automated pipeline. The advantages of the
proposed pipeline include informed and data specific input arguments for
baseline subtraction methods, the avoidance of time-intensive and subjective
piecewise baseline subtraction, and the ability to automate baseline
subtraction completely. Moreover, individual steps can be adopted as
stand-alone routines.Comment: 50 pages, 19 figure

Bagley, Christopher J.

Solomon, Patty J.

Stanford, Tyman E.

Proteome Science

English

arXiv

BACKGROUND Proteomic matrix-assisted laser desorption/ionisation (MALDI) linear time-of-flight (TOF) mass spectrometry (MS) may be used to produce protein profiles from biological samples with the aim of discovering biomarkers for disease. However, the raw protein profiles suffer from several sources of bias or systematic variation which need to be removed via pre-processing before meaningful downstream analysis of the data can be undertaken. Baseline subtraction, an early pre-processing step that removes the non-peptide signal from the spectra, is complicated by the following: (i) each spectrum has, on average, wider peaks for peptides with higher mass-to-charge ratios (m/z), and (ii) the time-consuming and error-prone trial-and-error process for optimising the baseline subtraction input arguments. With reference to the aforementioned complications, we present an automated pipeline that includes (i) a novel ‘continuous’ line segment algorithm that efficiently operates over data with a transformed m/z-axis to remove the relationship between peptide mass and peak width, and (ii) an input-free algorithm to estimate peak widths on the transformed m/z scale. RESULTS The automated baseline subtraction method was deployed on six publicly available proteomic MS datasets using six different m/z-axis transformations. Optimality of the automated baseline subtraction pipeline was assessed quantitatively using the mean absolute scaled error (MASE) when compared to a gold-standard baseline subtracted signal. Several of the transformations investigated were able to reduce, if not entirely remove, the peak width and peak location relationship resulting in near-optimal baseline subtraction using the automated pipeline. The proposed novel ‘continuous’ line segment algorithm is shown to far outperform naive sliding window algorithms with regard to the computational time required. The improvement in computational time was at least four-fold on real MALDI TOF-MS data and at least an order of magnitude on many simulated datasets. CONCLUSIONS The advantages of the proposed pipeline include informed and data specific input arguments for baseline subtraction methods, the avoidance of time-intensive and subjective piecewise baseline subtraction, and the ability to automate baseline subtraction completely. Moreover, individual steps can be adopted as stand-alone routines.Tyman E. Stanford, Christopher J. Bagley and Patty J. Solomo

Stanford, T.

Bagley, C.

Solomon, P.

Adelaide Research &amp; Scholarship

Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm

Tyman E. Stanford

Christopher J. Bagley

Patty J. Solomon

Springer - Publisher Connector

Informed baseline subtraction of proteomic mass spectrometry data aided
  by a novel sliding window algorithm

Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm

Abstract

Similar works

Full text

Available Versions

Adelaide Research & Scholarship

Springer - Publisher Connector

Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm

Abstract

Similar works

Full text

Available Versions

Adelaide Research &amp; Scholarship

Springer - Publisher Connector

Adelaide Research & Scholarship