2,141 research outputs found

    Integrated data management and validation platform for phosphorylated tandem mass spectrometry data

    Get PDF
    MS/MS is a widely used method for proteome-wide analysis of protein expression and PTMs. The thousands of MS/MS spectra produced from a single experiment pose a major challenge for downstream analysis. Standard programs, such as MASCOT, provide peptide assignments for many of the spectra, including identification of PTM sites, but these results are plagued by false-positive identifications. In phosphoproteomic experiments, only a single peptide assignment is typically available to support identification of each phosphorylation site, and hence minimizing false positives is critical. Thus, tedious manual validation is often required to increase confidence in the spectral assignments. We have developed phoMSVal, an open-source platform for managing MS/MS data and automatically validating identified phosphopeptides. We tested five classification algorithms with 17 extracted features to separate correct peptide assignments from incorrect ones using over 2600 manually curated spectra. The naïve Bayes algorithm was among the best classifiers with an AUC value of 97% and PPV of 97% for phosphotyrosine data. This classifier required only three features to achieve a 76% decrease in false positives as compared with MASCOT while retaining 97% of true positives. This algorithm was able to classify an independent phosphoserine/threonine data set with AUC value of 93% and PPV of 91%, demonstrating the applicability of this method for all types of phospho-MS/MS data. PhoMSVal is available at http://csbi.ltdk.helsinki.fi/phomsval.National Science Foundation (U.S.). Graduate Research Fellowship Progra

    Increasing peptide identifications and decreasing search times for ETD spectra by pre-processing and calculation of parent precursor charge

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Electron Transfer Dissociation [ETD] can dissociate multiply charged precursor polypeptides, providing extensive peptide backbone cleavage. ETD spectra contain charge reduced precursor peaks, usually of high intensity, and whose pattern is dependent on its parent precursor charge. These charge reduced precursor peaks and associated neutral loss peaks should be removed before these spectra are searched for peptide identifications. ETD spectra can also contain ion-types other than c and z<b>˙</b>. Modifying search strategies to accommodate these ion-types may aid in increased peptide identifications. Additionally, if the precursor mass is measured using a lower resolution instrument such as a linear ion trap, the charge of the precursor is often not known, reducing sensitivity and increasing search times. We implemented algorithms to remove these precursor peaks, accommodate new ion-types in noise filtering routine in OMSSA and to estimate any unknown precursor charge, using Linear Discriminant Analysis [LDA].</p> <p>Results</p> <p>Spectral pre-processing to remove precursor peaks and their associated neutral losses prior to protein sequence library searches resulted in a 9.8% increase in peptide identifications at a 1% False Discovery Rate [FDR] compared to previous OMSSA filter. Modifications to the OMSSA noise filter to accommodate various ion-types resulted in a further 4.2% increase in peptide identifications at 1% FDR. Moreover, ETD spectra when searched with charge states obtained from the precursor charge determination algorithm is shown to be up to 3.5 times faster than the general range search method, with a minor 3.8% increase in sensitivity.</p> <p>Conclusion</p> <p>Overall, there is an 18.8% increase in peptide identifications at 1% FDR by incorporating the new precursor filter, noise filter and by using the charge determination algorithm, when compared to previous versions of OMSSA.</p

    QuasiNovo: Algorithms for De Novo Peptide Sequencing

    Get PDF
    High-throughput proteomics analysis involves the rapid identification and characterization of large sets of proteins in complex biological samples. Tandem mass spectrometry (MS/MS) has become the leading approach for the experimental identification of proteins. Accurate analysis of the data produced is a computationally challenging process that relies on a complex understanding of molecular dynamics, signal processing, and pattern classification. In this work we address these modeling and classification problems, and introduce an additional data-driven evolutionary information source into the analysis pipeline. The particular problem being solved is peptide sequencing via MS/MS. The objective in solving this problem is to decipher the amino acid sequence of digested proteins (peptides) from the MS/MS spectra produced in a typical experimental protocol. Our approach sequences peptides using only the information contained in the experimental spectrum (de novo) and distributions of amino acid usage learned from large sets of protein sequence data. In this dissertation we pursue three main objectives: an ion classifier based on a neural network which selects informative ions from the spectrum, a peptide sequencer which uses dynamic programming and a scoring function to generate candidate peptide sequences, and a candidate peptide scoring function. Candidate peptide sequences are generated via a dynamic programming graph algorithm, and then scored using a combination of the neural network score, the amino acid usage score, and an edge frequency score. In addition to a complete de novo peptide sequencer, we also examine the use of amino acid usage models independently for reranking candidate peptides

    Evolutionary descent of prion genes from a ZIP metal ion transport ancestor

    Get PDF
    In the more than 20 years since its discovery, both the phylogenetic origin and cellular function of the prion protein (PrP) have remained enigmatic. Insights into the function of PrP may be obtained through a characterization of its molecular neighborhood. Quantitative interactome data revealed the spatial proximity of a subset of metal ion transporters of the ZIP family to mammalian prion proteins. A subsequent bioinformatic analysis revealed the presence of a prion-like protein sequence within the N-terminal, extracellular domain of a phylogenetic branch of ZIPs. Additional structural threading and ortholog sequence alignment analyses consolidated the conclusion that the prion protein gene family is phylogenetically derived from a ZIP-like ancestor molecule. Our data explain structural and functional features found within mammalian prion proteins as elements of an ancient involvement in the transmembrane transport of divalent cations. The connection to ZIP proteins is expected to open new avenues to elucidate the biology of the prion protein in health and disease

    Kinetically Trapped Liquid-State Conformers of a Sodiated Model Peptide Observed in the Gas Phase

    Get PDF
    We investigate the peptide AcPheAla5LysH+, a model system for studying helix formation in the gas phase, in order to fully understand the forces that stabilize the helical structure. In particular, we address the question of whether the local fixation of the positive charge at the peptide's C-terminus is a prerequisite for forming helices by replacing the protonated C-terminal Lys residue by Ala and a sodium cation. The combination of gas-phase vibrational spectroscopy of cryogenically cooled ions with molecular simulations based on density-functional theory (DFT) allows for detailed structure elucidation. For sodiated AcPheAla6, we find globular rather than helical structures, as the mobile positive charge strongly interacts with the peptide backbone and disrupts secondary structure formation. Interestingly, the global minimum structure from simulation is not present in the experiment. We interpret that this is due to high barriers involved in re-arranging the peptide-cation interaction that ultimately result in kinetically trapped structures being observed in the experiment.Comment: 28 pages, 10 figure

    MALDI Mass Spectrometry Imaging for the Discovery of Prostate Carcinoma Biomarkers

    Get PDF
    The elucidation of new biological markers of prostate cancer (PCa) should aid in the detection, and prognosis of this disease. Diagnostic decision making by pathologists in prostate cancer is highly dependent on tissue morphology. The ability to localize disease-specific molecular changes in tissue would help improve this critical pathology decision making process. Direct profiling of proteins in tissue sections using MALDI imaging mass spectrometry (MALDI-IMS) has the power to link molecular detail to morphological and pathological changes, enhancing the ability to identify candidates for new specific biomarkers. However, critical questions remain regarding the integration of this technique with clinical decision making. To address these questions, and to investigate the potential of MALDI-IMS for the diagnosis of prostate cancer, we have used this approach to analyze prostate tissue for the determination of the cellular origins of different protein signals to improve cancer detection and to identify specific protein markers of PCa. We found that specific protein/peptide expression changes correlated with the presence or absence of prostate cancer as well as the presence of micro-metastatic disease. Additionally, the over-expression of a single peptide (m/z = 4355) was able to accurately define primary cancer tissue from adjacent normal tissue. Tandem mass spectrometry analysis identified this peptide as a fragment of MEKK2, a member of the MAP kinase signaling pathway. Validation of MEKK2 overexpression in moderately differentiated PCa and prostate cancer cell lines was performed using immunohistochemistry and Western Blot analysis. Classification algorithms using specific ions differentially expressed in PCa tissue and a ROC cut-off value for the normalized intensity of the MEKK2 fragment at m/z 4355 were used to classify a blinded validation set. Finally, the optimization of sample processing in a new fixative which preserves macromolecules has led to improved through-put of samples making MALDI-IMS more compatible with current histological applications, facilitating its implementation in a clinical setting. This study highlights the potential of MALDI-IMS to define the molecular events involved in prostate tumorigenesis and demonstrates the applicability of this approach to clinical diagnostics as an aid to pathological decision making in prostate cancer

    Pre-processing of tandem mass spectra using machine learning methods

    Get PDF
    Protein identification has been more helpful than before in the diagnosis and treatment of many diseases, such as cancer, heart disease and HIV. Tandem mass spectrometry is a powerful tool for protein identification. In a typical experiment, proteins are broken into small amino acid oligomers called peptides. By determining the amino acid sequence of several peptides of a protein, its whole amino acid sequence can be inferred. Therefore, peptide identification is the first step and a central issue for protein identification. Tandem mass spectrometers can produce a large number of tandem mass spectra which are used for peptide identification. Two issues should be addressed to improve the performance of current peptide identification algorithms. Firstly, nearly all spectra are noise-contaminated. As a result, the accuracy of peptide identification algorithms may suffer from the noise in spectra. Secondly, the majority of spectra are not identifiable because they are of too poor quality. Therefore, much time is wasted attempting to identify these unidentifiable spectra. The goal of this research is to design spectrum pre-processing algorithms to both speedup and improve the reliability of peptide identification from tandem mass spectra. Firstly, as a tandem mass spectrum is a one dimensional signal consisting of dozens to hundreds of peaks, and majority of peaks are noisy peaks, a spectrum denoising algorithm is proposed to remove most noisy peaks of spectra. Experimental results show that our denoising algorithm can remove about 69% of peaks which are potential noisy peaks among a spectrum. At the same time, the number of spectra that can be identified by Mascot algorithm increases by 31% and 14% for two tandem mass spectrum datasets. Next, a two-stage recursive feature elimination based on support vector machines (SVM-RFE) and a sparse logistic regression method are proposed to select the most relevant features to describe the quality of tandem mass spectra. Our methods can effectively select the most relevant features in terms of performance of classifiers trained with the different number of features. Thirdly, both supervised and unsupervised machine learning methods are used for the quality assessment of tandem mass spectra. A supervised classifier, (a support vector machine) can be trained to remove more than 90% of poor quality spectra without removing more than 10% of high quality spectra. Clustering methods such as model-based clustering are also used for quality assessment to cancel the need for a labeled training dataset and show promising results

    Prediction of outcome of non-small cell lung cancer patients treated with chemotherapy and bortezomib by time-course MALDI-TOF-MS serum peptide profiling

    Get PDF
    Background: Only a minority of patients with advanced non-small cell lung cancer (NSCLC) benefit from chemotherapy. Serum peptide profiling of NSCLC patients was performed to investigate patterns associated with treatment outcome. Using magnetic bead-assisted serum peptide capture coupled to matrix-assisted laser desorption/ ionization time-of-flight mass spectrometry (MALDI-TOF-MS), serum peptide mass profiles of 27 NSCLC patients treated with cisplatin-gemcitabine chemotherapy and bortezomib were obtained. Support vector machine-based algorithms to predict clinical outcome were established based on differential pre-treatment peptide profiles and dynamic changes in peptide abundance during treatment. Results: A 6-peptide ion signature distinguished with 82% accuracy, sensitivity and specificity patients with a relatively short vs. long progression-free survival (PFS) upon treatment. Prediction of long PFS was associated with longer overall survival. Inclusion of 7 peptide ions showing differential changes in abundance during treatment led to a 13-peptide ion signature with 86% accuracy at 100% sensitivity and 73% specificity. A 5-peptide ion signature could separate patients with a partial response vs. non-responders with 89% accuracy at 100% sensitivity and 83% specificity. Differential peptide profiles were also found when comparing the NSCLC serum profiles to those from cancer-free control subjects. Conclusion: This study shows that serum peptidome profiling using MALDI-TOF-MS coupled to pattern diagnostics may aid in prediction of treatment outcome of advanced NSCLC patients treated with chemotherap

    Rapid Screening of Ellagitannins in Natural Sources via Targeted Reporter Ion Triggered Tandem Mass Spectrometry

    Get PDF
    Complex biomolecules present in their natural sources have been difficult to analyze using traditional analytical approaches. Ultrahigh-performance liquid chromatography (UHPLC-MS/MS) methods have the potential to enhance the discovery of a less well characterized and challenging class of biomolecules in plants, the ellagitannins. We present an approach that allows for the screening of ellagitannins by employing higher energy collision dissociation (HCD) to generate reporter ions for classification and collision-induced dissociation (CID) to generate unique fragmentation spectra for isomeric variants of previously unreported species. Ellagitannin anions efficiently form three characteristic reporter ions after HCD fragmentation that allows for the classification of unknown precursors that we call targeted reporter ion triggering (TRT). We demonstrate how a tandem HCD-CID experiment might be used to screen natural sources using UHPLC-MS/MS by application of 22 method conditions from which an optimized data-dependent acquisition (DDA) emerged. The method was verified not to yield false-positive results in complex plant matrices. We were able to identify 154 non-isomeric ellagitannins from strawberry leaves, which is 17 times higher than previously reported in the same matrix. The systematic inclusion of CID spectra for isomers of each species classified as an ellagitannin has never been possible before the development of this approach
    corecore