18,799 research outputs found
Biomarker discovery and redundancy reduction towards classification using a multi-factorial MALDI-TOF MS T2DM mouse model dataset
Diabetes like many diseases and biological processes is not mono-causal. On the one hand multifactorial studies with complex experimental design are required for its comprehensive analysis. On the other hand, the data from these studies often include a substantial amount of redundancy such as proteins that are typically represented by a multitude of peptides. Coping simultaneously with both complexities (experimental and technological) makes data analysis a challenge for Bioinformatics
Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm
Proteomic matrix-assisted laser desorption/ionisation (MALDI) linear
time-of-flight (TOF) mass spectrometry (MS) may be used to produce protein
profiles from biological samples with the aim of discovering biomarkers for
disease. However, the raw protein profiles suffer from several sources of bias
or systematic variation which need to be removed via pre-processing before
meaningful downstream analysis of the data can be undertaken. Baseline
subtraction, an early pre-processing step that removes the non-peptide signal
from the spectra, is complicated by the following: (i) each spectrum has, on
average, wider peaks for peptides with higher mass-to-charge ratios (m/z), and
(ii) the time-consuming and error-prone trial-and-error process for optimising
the baseline subtraction input arguments. With reference to the aforementioned
complications, we present an automated pipeline that includes (i) a novel
`continuous' line segment algorithm that efficiently operates over data with a
transformed m/z-axis to remove the relationship between peptide mass and peak
width, and (ii) an input-free algorithm to estimate peak widths on the
transformed m/z scale. The automated baseline subtraction method was deployed
on six publicly available proteomic MS datasets using six different m/z-axis
transformations. Optimality of the automated baseline subtraction pipeline was
assessed quantitatively using the mean absolute scaled error (MASE) when
compared to a gold-standard baseline subtracted signal. Near-optimal baseline
subtraction was achieved using the automated pipeline. The advantages of the
proposed pipeline include informed and data specific input arguments for
baseline subtraction methods, the avoidance of time-intensive and subjective
piecewise baseline subtraction, and the ability to automate baseline
subtraction completely. Moreover, individual steps can be adopted as
stand-alone routines.Comment: 50 pages, 19 figure
Evaluation of peak-picking algorithms for protein mass spectrometry
Peak picking is an early key step in MS data analysis. We compare three commonly used approaches to peak picking and discuss their merits by means of statistical analysis. Methods investigated encompass signal-to-noise ratio, continuous wavelet transform, and a correlation-based approach using a Gaussian template.
Functionality of the three methods is illustrated and discussed in a practical context using a mass spectral data set created with MALDI-TOF technology. Sensitivity and specificity are investigated using a manually defined reference set of peaks. As an additional criterion, the robustness of the three methods is assessed by a perturbation analysis and illustrated using ROC curves
Current challenges in software solutions for mass spectrometry-based quantitative proteomics
This work was in part supported by the PRIME-XS project, grant agreement number 262067, funded by the European Union seventh Framework Programme; The Netherlands Proteomics Centre, embedded in The Netherlands Genomics Initiative; The Netherlands Bioinformatics Centre; and the Centre for Biomedical Genetics (to S.C., B.B. and A.J.R.H); by NIH grants NCRR RR001614 and RR019934 (to the UCSF Mass Spectrometry Facility, director: A.L. Burlingame, P.B.); and by grants from the MRC, CR-UK, BBSRC and Barts and the London Charity (to P.C.
Sparse Proteomics Analysis - A compressed sensing-based approach for feature selection and classification of high-dimensional proteomics mass spectrometry data
Background: High-throughput proteomics techniques, such as mass spectrometry
(MS)-based approaches, produce very high-dimensional data-sets. In a clinical
setting one is often interested in how mass spectra differ between patients of
different classes, for example spectra from healthy patients vs. spectra from
patients having a particular disease. Machine learning algorithms are needed to
(a) identify these discriminating features and (b) classify unknown spectra
based on this feature set. Since the acquired data is usually noisy, the
algorithms should be robust against noise and outliers, while the identified
feature set should be as small as possible.
Results: We present a new algorithm, Sparse Proteomics Analysis (SPA), based
on the theory of compressed sensing that allows us to identify a minimal
discriminating set of features from mass spectrometry data-sets. We show (1)
how our method performs on artificial and real-world data-sets, (2) that its
performance is competitive with standard (and widely used) algorithms for
analyzing proteomics data, and (3) that it is robust against random and
systematic noise. We further demonstrate the applicability of our algorithm to
two previously published clinical data-sets
- …