24,647 research outputs found

    Identification of biomarkers from mass spectrometry data using a "common" peak approach

    Get PDF
    BACKGROUND: Proteomic data obtained from mass spectrometry have attracted great interest for the detection of early-stage cancer. However, as mass spectrometry data are high-dimensional, identification of biomarkers is a key problem. RESULTS: This paper proposes the use of "common" peaks in data as biomarkers. Analysis is conducted as follows: data preprocessing, identification of biomarkers, and application of AdaBoost to construct a classification function. Informative "common" peaks are selected by AdaBoost. AsymBoost is also examined to balance false negatives and false positives. The effectiveness of the approach is demonstrated using an ovarian cancer dataset. CONCLUSION: Continuous covariates and discrete covariates can be used in the present approach. The difference between the result for the continuous covariates and that for the discrete covariates was investigated in detail. In the example considered here, both covariates provide a good prediction, but it seems that they provide different kinds of information. We can obtain more information on the structure of the data by integrating both results

    Comparison of Supervised Classification Methods for Protein Profiling in Cancer Diagnosis

    Get PDF
    A key challenge in clinical proteomics of cancer is the identification of biomarkers that could allow detection, diagnosis and prognosis of the diseases. Recent advances in mass spectrometry and proteomic instrumentations offer unique chance to rapidly identify these markers. These advances pose considerable challenges, similar to those created by microarray-based investigation, for the discovery of pattern of markers from high-dimensional data, specific to each pathologic state (e.g. normal vs cancer). We propose a three-step strategy to select important markers from high-dimensional mass spectrometry data using surface enhanced laser desorption/ionization (SELDI) technology. The first two steps are the selection of the most discriminating biomarkers with a construction of different classifiers. Finally, we compare and validate their performance and robustness using different supervised classification methods such as Support Vector Machine, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Neural Networks, Classification Trees and Boosting Trees. We show that the proposed method is suitable for analysing high-throughput proteomics data and that the combination of logistic regression and Linear Discriminant Analysis outperform other methods tested

    Reproducible Cancer Biomarker Discovery in SELDI-TOF MS Using Different Pre-Processing Algorithms

    Get PDF
    BACKGROUND: There has been much interest in differentiating diseased and normal samples using biomarkers derived from mass spectrometry (MS) studies. However, biomarker identification for specific diseases has been hindered by irreproducibility. Specifically, a peak profile extracted from a dataset for biomarker identification depends on a data pre-processing algorithm. Until now, no widely accepted agreement has been reached. RESULTS: In this paper, we investigated the consistency of biomarker identification using differentially expressed (DE) peaks from peak profiles produced by three widely used average spectrum-dependent pre-processing algorithms based on SELDI-TOF MS data for prostate and breast cancers. Our results revealed two important factors that affect the consistency of DE peak identification using different algorithms. One factor is that some DE peaks selected from one peak profile were not detected as peaks in other profiles, and the second factor is that the statistical power of identifying DE peaks in large peak profiles with many peaks may be low due to the large scale of the tests and small number of samples. Furthermore, we demonstrated that the DE peak detection power in large profiles could be improved by the stratified false discovery rate (FDR) control approach and that the reproducibility of DE peak detection could thereby be increased. CONCLUSIONS: Comparing and evaluating pre-processing algorithms in terms of reproducibility can elucidate the relationship among different algorithms and also help in selecting a pre-processing algorithm. The DE peaks selected from small peak profiles with few peaks for a dataset tend to be reproducibly detected in large peak profiles, which suggests that a suitable pre-processing algorithm should be able to produce peaks sufficient for identifying useful and reproducible biomarkers

    Peaks detection and alignment for mass spectrometry data

    Get PDF
    The goal of this paper is to review existing methods for protein mass spectrometry data analysis, and to present a new methodology for automatic extraction of significant peaks (biomarkers). For the pre-processing step required for data from MALDI-TOF or SELDI- TOF spectra, we use a purely nonparametric approach that combines stationary invariant wavelet transform for noise removal and penalized spline quantile regression for baseline correction. We further present a multi-scale spectra alignment technique that is based on identification of statistically significant peaks from a set of spectra. This method allows one to find common peaks in a set of spectra that can subsequently be mapped to individual proteins. This may serve as useful biomarkers in medical applications, or as individual features for further multidimensional statistical analysis. MALDI-TOF spectra obtained from serum samples are used throughout the paper to illustrate the methodology

    Programmed cell death 6 interacting protein (PDCD6IP) and Rabenosyn-5 (ZFYVE20) are potential urinary biomarkers for upper gastrointestinal cancer

    Get PDF
    PURPOSE: Cancer of the upper digestive tract (uGI) is a major contributor to cancer-related death worldwide. Due to a rise in occurrence, together with poor survival rates and a lack of diagnostic or prognostic clinical assays, there is a clear need to establish molecular biomarkers. EXPERIMENTAL DESIGN: Initial assessment was performed on urine samples from 60 control and 60 uGI cancer patients using MS to establish a peak pattern or fingerprint model, which was validated by a further set of 59 samples. RESULTS: We detected 86 cluster peaks by MS above frequency and detection thresholds. Statistical testing and model building resulted in a peak profiling model of five relevant peaks with 88% overall sensitivity and 91% specificity, and overall correctness of 90%. High-resolution MS of 40 samples in the 2-10 kDa range resulted in 646 identified proteins, and pattern matching identified four of the five model peaks within significant parameters, namely programmed cell death 6 interacting protein (PDCD6IP/Alix/AIP1), Rabenosyn-5 (ZFYVE20), protein S100A8, and protein S100A9, of which the first two were validated by Western blotting. CONCLUSIONS AND CLINICAL RELEVANCE: We demonstrate that MS analysis of human urine can identify lead biomarker candidates in uGI cancers, which makes this technique potentially useful in defining and consolidating biomarker patterns for uGI cancer screening

    J Eukaryot Microbiol

    Get PDF
    Emerging methods based on mass spectrometry (MS) can be used in the rapid identification of microorganisms. Thus far, these practical and rapidly evolving methods have mainly been applied to characterize prokaryotes. We applied matrix-assisted laser-desorption-ionization-time-of-flight mass spectrometry MALDI-TOF MS in the analysis of whole cells of 18 N. fowleri isolates belonging to three genotypes. Fourteen originated from the cerebrospinal fluid or brain tissue of primary amoebic meningoencephalitis patients and four originated from water samples of hot springs, rivers, lakes or municipal water supplies. Whole Naegleria trophozoites grown in axenic cultures were washed and mixed with MALDI matrix. Mass spectra were acquired with a 4700 TOF-TOF instrument. MALDI-TOF MS yielded consistent patterns for all isolates examined. Using a combination of novel data processing methods for visual peak comparison, statistical analysis and proteomics database searching we were able to detect several biomarkers that can differentiate all species and isolates studied, along with common biomarkers for all N. fowleri isolates. Naegleria fowleri could be easily separated from other species within the genus Naegleria. A number of peaks detected were tentatively identified. MALDI-TOF MS fingerprinting is a rapid, reproducible, high-throughput alternative method for identifying Naegleria isolates. This method has potential for studying eukaryotic agents.CC999999/Intramural CDC HHS/United States2017-12-26T00:00:00Z25231600PMC574320

    Optimization and evaluation of surface-enhanced laser-desorption/ionization time-of-flight mass spectrometry for protein profiling of cerebrospinal fluid

    Get PDF
    Cerebrospinal fluid (CSF) potentially carries an archive of peptides and small proteins relevant to pathological processes in the central nervous system (CNS) and surrounding brain tissue. Proteomics is especially well suited for the discovery of biomarkers of diagnostic potential in CSF for early diagnosis and discrimination of several neurodegenerative diseases. ProteinChip surface-enhanced laser-desorption/ionization time-of-flight mass spectrometry (SELDI-TOF-MS) is one such approach which offers a unique platform for high throughput profiling of peptides and small proteins in CSF. In this study, we evaluated methodologies for the retention of CSF proteins < 20 kDa in size, and identify a strategy for screening small proteins and peptides in CSF. ProteinChip array types, along with sample and binding buffer conditions, and matrices were investigated. By coupling the processing of arrays to a liquid handler reproducible and reliable profiles, with mean peak coefficients of variation < 20%, were achieved for intra- and inter-assays under selected conditions. Based on peak m/z we found a high degree of overlap between the tested array surfaces. The combination of CM10 and IMAC30 arrays was sufficient to represent between 80–90% of all assigned peaks when using either sinapinic acid or α-Cyano-4-hydroxycinnamic acid as the energy absorbing matrices. Moreover, arrays processed with SPA consistently showed better peak resolution and higher peak number across all surfaces within the measured mass range. We intend to use CM10 and IMAC30 arrays prepared in sinapinic acid as a fast and cost-effective approach to drive decisions on sample selection prior to more in-depth discovery of diagnostic biomarkers in CSF using alternative but complementary proteomic strategies

    Metabolic profiling of human plasma and urine in chronic kidney disease by hydrophilic interaction liquid chromatography coupled with time-of-flight mass spectrometry : a pilot study

    Get PDF
    A typical characteristic of chronic kidney disease (CKD) is the progressive loss in renal function over a period of months or years with the concomitant accumulation of uremic retention solutes in the body. Known biomarkers for the kidney deterioration, such as serum creatinine or urinary albumin, do not allow effective early detection of CKD, which is essential towards disease management. In this work, a hydrophilic interaction liquid chromatography time-of-flight mass spectrometric (HILIC-TOF MS) platform was optimized allowing the search for novel uremic retention solutes and/or biomarkers of CKD. The HILIC-ESI-MS approach was used for the comparison of urine and plasma samples from CKD patients at stage 3 (n = 20), at stage 5 not yet receiving dialysis (n = 20) and from healthy controls (n = 20). Quality control samples were used to control and ensure the validity of the metabolomics approach. Subsequently the data were treated with the XCMS software for multivariate statistical analysis. In this way, differentiation could be achieved between the measured metabolite profile of the CKD patients versus the healthy controls. The approach allowed the elucidation of a number of metabolites that showed a significant up- and downregulation throughout the different stages of CKD. These compounds are cinnamoylglycine, glycoursodeoxycholic acid, 2-hydroxyethane sulfonate, and pregnenolone sulfate of which the identity was unambiguously confirmed via the use of authentic standards. The latter three are newly identified uremic retention solutes
    corecore