554 research outputs found

    Applications of graph theory in protein structure identification

    Get PDF
    There is a growing interest in the identification of proteins on the proteome wide scale. Among different kinds of protein structure identification methods, graph-theoretic methods are very sharp ones. Due to their lower costs, higher effectiveness and many other advantages, they have drawn more and more researchers’ attention nowadays. Specifically, graph-theoretic methods have been widely used in homology identification, side-chain cluster identification, peptide sequencing and so on. This paper reviews several methods in solving protein structure identification problems using graph theory. We mainly introduce classical methods and mathematical models including homology modeling based on clique finding, identification of side-chain clusters in protein structures upon graph spectrum, and de novo peptide sequencing via tandem mass spectrometry using the spectrum graph model. In addition, concluding remarks and future priorities of each method are given

    De novo peptide sequencing methods for tandem mass spectra

    Get PDF
    De novo peptide sequencing from MS/MS spectra has become of primary importance in proteomics. It provides essential information for studies of protein structure and function. With the availability of various MS/MS spectra, a lot of computational methods have been developed to infer peptide sequences from them. However, current de novo peptide sequencing methods still have limitations. Some major ones include a lack of suitable models reflecting MS/MS spectra, limited information extracted from MS/MS spectra, and the inefficient use of multiple spectra. This thesis addresses some of the limitations with a series of novel computational methods designed for various MS/MS spectra and their combinations. The main content of the thesis starts with a comprehensive review of recent developments in de novo peptide sequencing methods, followed by two novel methods for single spectrum sequencing problems, and then presents two paired spectra sequencing methods. The first chapter introduces relevant background information, objectives of the study, and the structure of the thesis. After that, a comprehensive review of de novo peptide sequencing methods is given. It summarizes recent developments of computational methods for various experimental spectra, compares and analyzes their advantages and disadvantages, and points out some future research directions. Having these potential research directions, the thesis next presents two novel methods designed for higher-energy collisional dissociation (HCD) spectra and electron capture dissociation (ECD) (or electron transfer dissociation (ETD)) spectra, respectively. These methods apply new spectrum graph models with multiple types of edges, integrate amino acid combination (AAC) information and peptide tags, and consider spectrum-specific information to suit different spectra. After that, multiple spectra sequencing problem is studied. A framework for de novo peptide sequencing of multiple spectra is given with applications to two different spectra pairs. One pair is spectrally complementary to each other, and the other is similar spectra with property differences. These methods include effective spectra merging criteria and parent mass correction steps, and modify the previously proposed graph models to fit the merged spectra. Experiments on several experimental MS/MS spectra datasets and datasets pairs show the advantages of the proposed methods in terms of peptide sequencing accuracy. Finally, conclusions and future work directions are given at the end of the thesis. To summarize the work in the thesis, a series of novel computational methods for de novo peptide sequencing are proposed. These methods target different types of MS/MS spectra and their combinations. Experiential results show the proposed methods are either better than competing methods that already exist, or fill gaps in the suite of currently available methods

    A high-throughput \u3ci\u3ede novo\u3c/i\u3e sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry

    Get PDF
    Abstract Background High-resolution tandem mass spectra can now be readily acquired with hybrid instruments, such as LTQ-Orbitrap and LTQ-FT, in high-throughput shotgun proteomics workflows. The improved spectral quality enables more accurate de novo sequencing for identification of post-translational modifications and amino acid polymorphisms. Results In this study, a new de novo sequencing algorithm, called Vonode, has been developed specifically for analysis of such high-resolution tandem mass spectra. To fully exploit the high mass accuracy of these spectra, a unique scoring system is proposed to evaluate sequence tags based primarily on mass accuracy information of fragment ions. Consensus sequence tags were inferred for 11,422 spectra with an average peptide length of 5.5 residues from a total of 40,297 input spectra acquired in a 24-hour proteomics measurement of Rhodopseudomonas palustris. The accuracy of inferred consensus sequence tags was 84%. According to our comparison, the performance of Vonode was shown to be superior to the PepNovo v2.0 algorithm, in terms of the number of de novo sequenced spectra and the sequencing accuracy. Conclusions Here, we improved de novo sequencing performance by developing a new algorithm specifically for high-resolution tandem mass spectral data. The Vonode algorithm is freely available for download at http://compbio.ornl.gov/Vonode webcite

    A high-throughput de novo sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-resolution tandem mass spectra can now be readily acquired with hybrid instruments, such as LTQ-Orbitrap and LTQ-FT, in high-throughput shotgun proteomics workflows. The improved spectral quality enables more accurate <it>de novo </it>sequencing for identification of post-translational modifications and amino acid polymorphisms.</p> <p>Results</p> <p>In this study, a new <it>de novo </it>sequencing algorithm, called Vonode, has been developed specifically for analysis of such high-resolution tandem mass spectra. To fully exploit the high mass accuracy of these spectra, a unique scoring system is proposed to evaluate sequence tags based primarily on mass accuracy information of fragment ions. Consensus sequence tags were inferred for 11,422 spectra with an average peptide length of 5.5 residues from a total of 40,297 input spectra acquired in a 24-hour proteomics measurement of <it>Rhodopseudomonas palustris</it>. The accuracy of inferred consensus sequence tags was 84%. According to our comparison, the performance of Vonode was shown to be superior to the PepNovo v2.0 algorithm, in terms of the number of <it>de novo </it>sequenced spectra and the sequencing accuracy.</p> <p>Conclusions</p> <p>Here, we improved <it>de novo </it>sequencing performance by developing a new algorithm specifically for high-resolution tandem mass spectral data. The Vonode algorithm is freely available for download at <url>http://compbio.ornl.gov/Vonode</url>.</p

    De novo sequencing of MS/MS spectra

    Get PDF
    Proteomics is the study of proteins, their time- and location-dependent expression profiles, as well as their modifications and interactions. Mass spectrometry is useful to investigate many of the questions asked in proteomics. Database search methods are typically employed to identify proteins from complex mixtures. However, databases are not often available or, despite their availability, some sequences are not readily found therein. To overcome this problem, de novo sequencing can be used to directly assign a peptide sequence to a tandem mass spectrometry spectrum. Many algorithms have been proposed for de novo sequencing and a selection of them are detailed in this article. Although a standard accuracy measure has not been agreed upon in the field, relative algorithm performance is discussed. The current state of the de novo sequencing is assessed thereafter and, finally, examples are used to construct possible future perspectives of the field. © 2011 Expert Reviews Ltd.The Turkish Academy of Science (TÜBA

    ProbPS: A new model for peak selection based on quantifying the dependence of the existence of derivative peaks on primary ion intensity

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The analysis of mass spectra suggests that the existence of derivative peaks is strongly dependent on the intensity of the primary peaks. Peak selection from tandem mass spectrum is used to filter out noise and contaminant peaks. It is widely accepted that a valid primary peak tends to have high intensity and is accompanied by derivative peaks, including isotopic peaks, neutral loss peaks, and complementary peaks. Existing models for peak selection ignore the dependence between the existence of the derivative peaks and the intensity of the primary peaks. Simple models for peak selection assume that these two attributes are independent; however, this assumption is contrary to real data and prone to error.</p> <p>Results</p> <p>In this paper, we present a statistical model to quantitatively measure the dependence of the derivative peak's existence on the primary peak's intensity. Here, we propose a statistical model, named ProbPS, to capture the dependence in a quantitative manner and describe a statistical model for peak selection. Our results show that the quantitative understanding can successfully guide the peak selection process. By comparing ProbPS with AuDeNS we demonstrate the advantages of our method in both filtering out noise peaks and in improving <it>de novo </it>identification. In addition, we present a tag identification approach based on our peak selection method. Our results, using a test data set, suggest that our tag identification method (876 correct tags in 1000 spectra) outperforms PepNovoTag (790 correct tags in 1000 spectra).</p> <p>Conclusions</p> <p>We have shown that ProbPS improves the accuracy of peak selection which further enhances the performance of de novo sequencing and tag identification. Thus, our model saves valuable computation time and improving the accuracy of the results.</p

    Molecular Level Characterization of Dissolved Organic Matter Integrating Trapped Ion Mobility Spectrometry and Fourier Transform Ion Cyclotron Resonance Mass Spectrometry

    Get PDF
    Dissolved organic matter (DOM) is an extremely complex mixture of organic molecules ubiquitous in aquatic systems and a critical component of the global carbon cycle. Little is known about DOM structural composition at the molecular level. The work presented in this dissertation summarizes the development of a novel analytical toolbox based on trapped ion mobility spectrometry and Fourier transform ion cyclotron resonance mass spectrometry (TIMS-FT-ICR MS) that has significantly contributed to expand our knowledge of DOM molecular complexity and diversity. The TIMS-FT-ICR MS/MS analysis provided for the first-time lower and upper estimation of the molecular isomeric diversity. The TIMS-FT-ICR MS/MS methodology was further developed to allow for chemical formula-based isomeric and neutral loss fragmentation structural description and database validation. This novel procedure enabled the unambiguous assignment of candidate isomeric structures based on accurate mass, database MS/MS matching scores, and ion mobility. A fast and routine structural characterization DOM workflow method was developed: GraphDOM. The method utilizes neutral loss fragmentation patterns acquired using continuous accumulation of selected ions (CASI)-collision induced dissociation (CID) FT-ICR MS/MS. The neutral mass loss patterns are used to define structural families leading to the identification and visualization of the DOM transformational processes. The GraphDOM methodology was successfully applied to the characterization of DOM along a salinity transect of the Harney River, Florida Everglades. The GraphDOM method was further implemented with isomeric content description at the molecular level and applied to four common aquatic systems. The application of the GraphDOM methodology allowed for the first time identification of common and unique DOM transformational networks across aquatic ecosystems

    Predicting Proteome-Early Drug Induced Cardiac Toxicity Relationships (Pro-EDICToRs) with Node Overlapping Parameters (NOPs) of a new class of Blood Mass-Spectra graphs

    Get PDF
    The 11th International Electronic Conference on Synthetic Organic Chemistry session Computational ChemistryBlood Serum Proteome-Mass Spectra (SP-MS) may allow detecting Proteome-Early Drug Induced Cardiac Toxicity Relationships (called here Pro-EDICToRs). However, due to the thousands of proteins in the SP identifying general Pro-EDICToRs patterns instead of a single protein marker may represents a more realistic alternative. In this sense, first we introduced a novel Cartesian 2D spectrum graph for SP-MS. Next, we introduced the graph node-overlapping parameters (nopk) to numerically characterize SP-MS using them as inputs to seek a Quantitative Proteome-Toxicity Relationship (QPTR) classifier for Pro-EDICToRs with accuracy higher than 80%. Principal Component Analysis (PCA) on the nopk values present in the QPTR model explains with one factor (F1) the 82.7% of variance. Next, these nopk values were used to construct by the first time a Pro-EDICToRs Complex Network having nodes (samples) linked by edges (similarity between two samples). We compared the topology of two sub-networks (cardiac toxicity and control samples); finding extreme relative differences for the re-linking (P) and Zagreb (M2) indices (9.5 and 54.2 % respectively) out of 11 parameters. We also compared subnetworks with well known ideal random networks including Barabasi-Albert, Kleinberg Small World, Erdos-Renyi, and Epsstein Power Law models. Finally, we proposed Partial Order (PO) schemes of the 115 samples based on LDA-probabilities, F1-scores and/or network node degrees. PCA-CN and LDA-PCA based POs with Tanimoto’s coefficients equal or higher than 0.75 are promising for the study of Pro-EDICToRs. These results shows that simple QPTRs models based on MS graph numerical parameters are an interesting tool for proteome researchThe authors thank projects funded by the Xunta de Galicia (PXIB20304PR and BTF20302PR) and the Ministerio de Sanidad y Consumo (PI061457). González-Díaz H. acknowledges tenure track research position funded by the Program Isidro Parga Pondal, Xunta de Galici

    A Computational Framework for Heparan Sulfate Sequencing Using High-resolution Tandem Mass Spectra

    Get PDF
    Heparan sulfate (HS) is a linear polysaccharide expressed on cell surfaces, in extracellular matrices and cellular granules in metazoan cells. Through non-covalent binding to growth factors, morphogens, chemokines, and other protein families, HS is involved in all multicellular physiological activities. Its biological activities depend on the fine structures of its protein-binding domains, the determination of which remains a daunting task. Methods have advanced to the point that mass spectra with information-rich product ions may be produced on purified HS saccharides. However, the interpretation of these complex product ion patterns has emerged as the bottleneck to the dissemination of these HS sequencing methods. To solve this problem, we designed HS-SEQ, the first comprehensive algorithm for HS de novo sequencing using high-resolution tandem mass spectra. We tested HS-SEQ using negative electron transfer dissociation (NETD) tandem mass spectra generated from a set of pure synthetic saccharide standards with diverse sulfation patterns. The results showed that HS-SEQ rapidly and accurately determined the correct HS structures from large candidate pools
    corecore