60,840 research outputs found

    Mass spectrometry data mining for cancer detection

    Get PDF
    Early detection of cancer is crucial for successful intervention strategies. Mass spectrometry-based high throughput proteomics is recognized as a major breakthrough in cancer detection. Many machine learning methods have been used to construct classifiers based on mass spectrometry data for discriminating between cancer stages, yet, the classifiers so constructed generally lack biological interpretability. To better assist clinical uses, a key step is to discover ”biomarker signature profiles”, i.e. combinations of a small number of protein biomarkers strongly discriminating between cancer states. This dissertation introduces two innovative algorithms to automatically search for a signature and to construct a high-performance signature-based classifier for cancer discrimination tasks based on mass spectrometry data, such as data acquired by MALDI or SELDI techniques. Our first algorithm assumes that homogeneous groups of mass spectra can be modeled by (unknown) Gibbs distributions to generate an optimal signature and an associated signature-based classifier by robust log-likelihood analysis; our second algorithm uses a stochastic optimization algorithm to search for two lists of biomarkers, and then constructs a signature-based classifier. To support these two algorithms theoretically, this dissertation also studies the empirical probability distributions of mass spectrometry data and implements the actual fitting of Markov random fields to these high-dimensional distributions. We have validated our two signature discovery algorithms on several mass spectrometry datasets related to ovarian cancer and to colorectal cancer patients groups. For these cancer discrimination tasks, our algorithms have yielded better classification performances than existing machine learning algorithms and in addition,have generated more interpretable explicit signatures.Mathematics, Department o

    Ovarian Cancer Classification based on Mass Spectrometry Analysis of Sera

    Get PDF
    In our previous study [1], we have compared the performance of a number of widely used discrimination methods for classifying ovarian cancer using Matrix Assisted Laser Desorption Ionization (MALDI) mass spectrometry data on serum samples obtained from Reflectron mode. Our results demonstrate good performance with a random forest classifier. In this follow-up study, to improve the molecular classification power of the MALDI platform for ovarian cancer disease, we expanded the mass range of the MS data by adding data acquired in Linear mode and evaluated the resultant decrease in classification error. A general statistical framework is proposed to obtain unbiased classification error estimates and to analyze the effects of sample size and number of selected m/z features on classification errors. We also emphasize the importance of combining biological knowledge and statistical analysis to obtain both biologically and statistically sound results

    Computational protein biomarker prediction: a case study for prostate cancer

    Get PDF
    BACKGROUND: Recent technological advances in mass spectrometry pose challenges in computational mathematics and statistics to process the mass spectral data into predictive models with clinical and biological significance. We discuss several classification-based approaches to finding protein biomarker candidates using protein profiles obtained via mass spectrometry, and we assess their statistical significance. Our overall goal is to implicate peaks that have a high likelihood of being biologically linked to a given disease state, and thus to narrow the search for biomarker candidates. RESULTS: Thorough cross-validation studies and randomization tests are performed on a prostate cancer dataset with over 300 patients, obtained at the Eastern Virginia Medical School using SELDI-TOF mass spectrometry. We obtain average classification accuracies of 87% on a four-group classification problem using a two-stage linear SVM-based procedure and just 13 peaks, with other methods performing comparably. CONCLUSIONS: Modern feature selection and classification methods are powerful techniques for both the identification of biomarker candidates and the related problem of building predictive models from protein mass spectrometric profiles. Cross-validation and randomization are essential tools that must be performed carefully in order not to bias the results unfairly. However, only a biological validation and identification of the underlying proteins will ultimately confirm the actual value and power of any computational predictions

    Identifying Biomarkers from Mass Spectrometry Data with Ordinal Outcome

    Get PDF
    In recent years, there has been an increased interest in using protein mass spectroscopy to identify molecular markers that discriminate diseased from healthy individuals. Existing methods are tailored towards classifying observations into nominal categories. Sometimes, however, the outcome of interest may be measured on an ordered scale. Ignoring this natural ordering results in some loss of information. In this paper, we propose a Bayesian model for the analysis of mass spectrometry data with ordered outcome. The method provides a unified approach for identifying relevant markers and predicting class membership. This is accomplished by building a stochastic search variable selection method within an ordinal outcome model. We apply the methodology to mass spectrometry data on ovarian cancer cases and healthy individuals. We also utilize wavelet-based techniques to remove noise from the mass spectra prior to analysis. We identify protein markers associated with being healthy, having low grade ovarian cancer, or being a high grade case. For comparison, we repeated the analysis using conventional classification procedures and found improved predictive accuracy with our method

    Classification of cancer cell lines using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry and statistical analysis

    Get PDF
    Over the past decade, matrix-assisted laser desorption/ionization time‑of‑flight mass spectrometry (MALDI‑TOF MS) has been established as a valuable platform for microbial identification, and it is also frequently applied in biology and clinical studies to identify new markers expressed in pathological conditions. The aim of the present study was to assess the potential of using this approach for the classification of cancer cell lines as a quantifiable method for the proteomic profiling of cellular organelles. Intact protein extracts isolated from different tumor cell lines (human and murine) were analyzed using MALDI‑TOF MS and the obtained mass lists were processed using principle component analysis (PCA) within Bruker Biotyper¼ software. Furthermore, reference spectra were created for each cell line and were used for classification. Based on the intact protein profiles, we were able to differentiate and classify six cancer cell lines: two murine melanoma (B16‑F0 and B164A5), one human melanoma (A375), two human breast carcinoma (MCF7 and MDA‑MB‑231) and one human liver carcinoma (HepG2). The cell lines were classified according to cancer type and the species they originated from, as well as by their metastatic potential, offering the possibility to differentiate non‑invasive from invasive cells. The obtained results pave the way for developing a broad‑based strategy for the identification and classification of cancer cell

    Seminal plasma as a source of prostate cancer peptide biomarker candidates for detection of indolent and advanced disease

    Get PDF
    Background:Extensive prostate specific antigen screening for prostate cancer generates a high number of unnecessary biopsies and over-treatment due to insufficient differentiation between indolent and aggressive tumours. We hypothesized that seminal plasma is a robust source of novel prostate cancer (PCa) biomarkers with the potential to improve primary diagnosis of and to distinguish advanced from indolent disease. <br>Methodology/Principal Findings: In an open-label case/control study 125 patients (70 PCa, 21 benign prostate hyperplasia, 25 chronic prostatitis, 9 healthy controls) were enrolled in 3 centres. Biomarker panels a) for PCa diagnosis (comparison of PCa patients versus benign controls) and b) for advanced disease (comparison of patients with post surgery Gleason score <7 versus Gleason score >>7) were sought. Independent cohorts were used for proteomic biomarker discovery and testing the performance of the identified biomarker profiles. Seminal plasma was profiled using capillary electrophoresis mass spectrometry. Pre-analytical stability and analytical precision of the proteome analysis were determined. Support vector machine learning was used for classification. Stepwise application of two biomarker signatures with 21 and 5 biomarkers provided 83% sensitivity and 67% specificity for PCa detection in a test set of samples. A panel of 11 biomarkers for advanced disease discriminated between patients with Gleason score 7 and organ-confined (<pT3a) or advanced (≥pT3a) disease with 80% sensitivity and 82% specificity in a preliminary validation setting. Seminal profiles showed excellent pre-analytical stability. Eight biomarkers were identified as fragments of N-acetyllactosaminide beta-1,3-N-acetylglucosaminyltransferase​,prostatic acid phosphatase, stabilin-2, GTPase IMAP family member 6, semenogelin-1 and -2. Restricted sample size was the major limitation of the study.</br> <br>Conclusions/Significance: Seminal plasma represents a robust source of potential peptide makers for primary PCa diagnosis. Our findings warrant further prospective validation to confirm the diagnostic potential of identified seminal biomarker candidates.</br&gt

    Developing a discrimination rule between breast cancer patients and controls using proteomics mass spectrometric data: A three-step approach

    Get PDF
    To discriminate between breast cancer patients and controls, we used a three-step approach to obtain our decision rule. First, we ranked the mass/charge values using random forests, because it generates importance indices that take possible interactions into account. We observed that the top ranked variables consisted of highly correlated contiguous mass/charge values, which were grouped in the second step into new variables. Finally, these newly created variables were used as predictors to find a suitable discrimination rule. In this last step, we compared three different methods, namely Classification and Regression Tree ( CART), logistic regression and penalized logistic regression. Logistic regression and penalized logistic regression performed equally well and both had a higher classification accuracy than CART. The model obtained with penalized logistic regression was chosen as we hypothesized that this model would provide a better classification accuracy in the validation set. The solution had a good performance on the training set with a classification accuracy of 86.3%, and a sensitivity and specificity of 86.8% and 85.7%, respectively

    Classification of cancer cell lines using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry and statistical analysis

    Get PDF
    Over the past decade, matrix-assisted laser desorption/ionization time‑of‑flight mass spectrometry (MALDI‑TOF MS) has been established as a valuable platform for microbial identification, and it is also frequently applied in biology and clinical studies to identify new markers expressed in pathological conditions. The aim of the present study was to assess the potential of using this approach for the classification of cancer cell lines as a quantifiable method for the proteomic profiling of cellular organelles. Intact protein extracts isolated from different tumor cell lines (human and murine) were analyzed using MALDI‑TOF MS and the obtained mass lists were processed using principle component analysis (PCA) within Bruker Biotyper¼ software. Furthermore, reference spectra were created for each cell line and were used for classification. Based on the intact protein profiles, we were able to differentiate and classify six cancer cell lines: two murine melanoma (B16‑F0 and B164A5), one human melanoma (A375), two human breast carcinoma (MCF7 and MDA‑MB‑231) and one human liver carcinoma (HepG2). The cell lines were classified according to cancer type and the species they originated from, as well as by their metastatic potential, offering the possibility to differentiate non‑invasive from invasive cells. The obtained results pave the way for developing a broad‑based strategy for the identification and classification of cancer cell
