3 research outputs found

    PI: An open-source software package for validation of the SEQUEST result and visualization of mass spectrum

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Tandem mass spectrometry (MS/MS) has emerged as the leading method for high- throughput protein identification in proteomics. Recent technological breakthroughs have dramatically increased the efficiency of MS/MS data generation. Meanwhile, sophisticated algorithms have been developed for identifying proteins from peptide MS/MS data by searching available protein sequence databases for the peptide that is most likely to have produced the observed spectrum. The popular SEQUEST algorithm relies on the cross-correlation between the experimental mass spectrum and the theoretical spectrum of a peptide. It utilizes a simplified fragmentation model that assigns a fixed and identical intensity for all major ions and fixed and lower intensity for their neutral losses. In this way, the common issues involved in predicting theoretical spectra are circumvented. In practice, however, an experimental spectrum is usually not similar to its SEQUEST -predicted theoretical one, and as a result, incorrect identifications are often generated.</p> <p>Results</p> <p>Better understanding of peptide fragmentation is required to produce more accurate and sensitive peptide sequencing algorithms. Here, we designed the software PI of novel and exquisite algorithms that make a good use of intensity property of a spectrum.</p> <p>Conclusions</p> <p>We designed the software PI with the novel and effective algorithms which made a good use of intensity property of the spectrum. Experiments have shown that PI was able to validate and improve the results of SEQUEST to a more satisfactory degree.</p

    Score regularization for peptide identification

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Peptide identification from tandem mass spectrometry (MS/MS) data is one of the most important problems in computational proteomics. This technique relies heavily on the accurate assessment of the quality of peptide-spectrum matches (PSMs). However, current MS technology and PSM scoring algorithm are far from perfect, leading to the generation of incorrect peptide-spectrum pairs. Thus, it is critical to develop new post-processing techniques that can distinguish true identifications from false identifications effectively.</p> <p>Results</p> <p>In this paper, we present a consistency-based PSM re-ranking method to improve the initial identification results. This method uses one additional assumption that two peptides belonging to the same protein should be correlated to each other. We formulate an optimization problem that embraces two objectives through regularization: the smoothing consistency among scores of correlated peptides and the fitting consistency between new scores and initial scores. This optimization problem can be solved analytically. The experimental study on several real MS/MS data sets shows that this re-ranking method improves the identification performance.</p> <p>Conclusions</p> <p>The score regularization method can be used as a general post-processing step for improving peptide identifications. Source codes and data sets are available at: <url>http://bioinformatics.ust.hk/SRPI.rar</url>.</p

    Discovery of New Features for Peptide Sequencing with Mass Spectrometry

    Get PDF
    Bioinformaticians have been working on peptide sequencing with tandem mass spectrometry (MS/MS) for decades. However, the results are still not perfect. A lot of research have been carried on two peptide sequencing methods, database search and de novo sequencing. However, due to the quality of spectra and the inherent difficulty of this problem itself, both methods are having problem improving their results further better. The publishing of the NIST peptide library in May 2014 brought fresh ideas into this long lasting problem. This peptide library contains a large amount of MS/MS spectra and their corresponding peptide sequences. Taking advantage of this high-quality dataset, more and more researches have started to find internal patterns in MS/MS spectra since then. In this thesis, we are going to look more into this peptide library and use statistical and machine learning ideas to find new features to help improve peptide sequencing results. Two main contributions have been made. First, a general scoring feature is presented that can be incorporated in the scoring functions of other peptide sequencing software. The scoring feature is based on the intensity ratios between two adjacent y-ions in the spectrum. A method is proposed to obtain the probability distributions of such ratios, and to calculate the scoring feature based on the distributions. To demonstrate the performance of the method, this new feature is incorporated with X!Tandem and Novor and significantly improved their performances on testing data, respectively. Second, a machine learning model to predict the appearances of internal fragment ions in MS/MS spectra is presented. Even though this is the first model on this topic to the best of our knowledge, it achieves fairly good results. Several possible applications of this model are also discussed to show that this topic is valuable for peptide sequencing and thus worth further research
    corecore