Search CORE

4,334 research outputs found

Computational methods and tools for protein phosphorylation analysis

Author: Suni Veronika
Publication venue: fi=Turun yliopisto|en=University of Turku|
Publication date: 05/04/2019
Field of study

Signaling pathways represent a central regulatory mechanism of biological systems where a key event in their correct functioning is the reversible phosphorylation of proteins. Protein phosphorylation affects at least one-third of all proteins and is the most widely studied posttranslational modification. Phosphorylation analysis is still perceived, in general, as difficult or cumbersome and not readily attempted by many, despite the high value of such information. Specifically, determining the exact location of a phosphorylation site is currently considered a major hurdle, thus reliable approaches are necessary for the detection and localization of protein phosphorylation. The goal of this PhD thesis was to develop computation methods and tools for mass spectrometry-based protein phosphorylation analysis, particularly validation of phosphorylation sites. In the first two studies, we developed methods for improved identification of phosphorylation sites in MALDI-MS. In the first study it was achieved through the automatic combination of spectra from multiple matrices, while in the second study, an optimized protocol for sample loading and washing conditions was suggested. In the third study, we proposed and evaluated the hypothesis that in ESI-MS, tandem CID and HCD spectra of phosphopeptides can be accurately predicted and used in spectral library searching. This novel strategy for phosphosite validation and identification offered accuracy that outperformed the other currently existing popular methods and proved applicable to complex biological samples. And finally, we significantly improved the performance of our command-line prototype tool, added graphical user interface, and options for customizable simulation parameters and filtering of selected spectra, peptides or proteins. The new software, SimPhospho, is open-source and can be easily integrated in a phosphoproteomics data analysis workflow. Together, these bioinformatics methods and tools enable confident phosphosite assignment and improve reliable phosphoproteome identification and reportin

UTUPub

ANALYSIS AND SIMULATION OF TANDEM MASS SPECTROMETRY DATA

Author: Goldfarb Dennis
Publication venue: University of North Carolina at Chapel Hill Graduate School
Publication date: 01/01/2019
Field of study

This dissertation focuses on improvements to data analysis in mass spectrometry-based proteomics, which is the study of an organism’s full complement of proteins. One of the biggest surprises from the Human Genome Project was the relatively small number of genes (~20,000) encoded in our DNA. Since genes code for proteins, scientists expected more genes would be necessary to produce a diverse set of proteins to cover the many functions that support the complexity of life. Thus, there is intense interest in studying proteomics, including post-translational modifications (how proteins change after translation from their genes), and their interactions (e.g. proteins binding together to form complex molecular machines) to fill the void in molecular diversity. The goal of mass spectrometry in proteomics is to determine the abundance and amino acid sequence of every protein in a biological sample. A mass spectrometer can determine mass/charge ratios and abundance for fragments of short peptides (which are subsequences of a protein); sequencing algorithms determine which peptides are most likely to have generated the fragmentation patterns observed in the mass spectrum, and protein identity is inferred from the peptides. My work improves the computational tools for mass spectrometry by removing limitations on present algorithms, simulating mass spectroscopy instruments to facilitate algorithm development, and creating algorithms that approximate isotope distributions, deconvolve chimeric spectra, and predict protein-protein interactions. While most sequencing algorithms attempt to identify a single peptide per mass spectrum, multiple peptides are often fragmented together. Here, I present a method to deconvolve these chimeric mass spectra into their individual peptide components by examining the isotopic distributions of their fragments. First, I derived the equation to calculate the theoretical isotope distribution of a peptide fragment. Next, for cases where elemental compositions are not known, I developed methods to approximate the isotope distributions. Ultimately, I created a non-negative least squares model that deconvolved chimeric spectra and increased peptide-spectrum-matches by 15-30%. To improve the operation of mass spectrometer instruments, I developed software that simulates liquid chromatography-mass spectrometry data and the subsequent execution of custom data acquisition algorithms. The software provides an opportunity for researchers to test, refine, and evaluate novel algorithms prior to implementation on a mass spectrometer. Finally, I created a logistic regression classifier for predicting protein-protein interactions defined by affinity purification and mass spectrometry (APMS). The classifier increased the area under the receiver operating characteristic curve by 16% compared to previous methods. Furthermore, I created a web application to facilitate APMS data scoring within the scientific community.Doctor of Philosoph

Carolina Digital Repository

Quantification and Simulation of Liquid Chromatography-Mass Spectrometry Data

Author: Bielow C.
Publication venue
Publication date: 29/10/2012
Field of study

Computational mass spectrometry is a fast evolving field that has attracted increased attention over the last couple of years. The performance of software solutions determines the success of analysis to a great extent. New algorithms are required to reflect new experimental procedures and deal with new instrument generations. One essential component of algorithm development is the validation (as well as comparison) of software on a broad range of data sets. This requires a gold standard (or so-called ground truth), which is usually obtained by manual annotation of a real data set. Comprehensive manually annotated public data sets for mass spectrometry data are labor-intensive to produce and their quality strongly depends on the skill of the human expert. Some parts of the data may even be impossible to annotate due to high levels of noise or other ambiguities. Furthermore, manually annotated data is usually not available for all steps in a typical computational analysis pipeline. We thus developed the most comprehensive simulation software to date, which allows to generate multiple levels of ground truth and features a plethora of settings to reflect experimental conditions and instrument settings. The simulator is used to generate several distinct types of data. The data are subsequently employed to evaluate existing algorithms. Additionally, we employ simulation to determine the influence of instrument attributes and sample complexity on the ability of algorithms to recover information. The results give valuable hints on how to optimize experimental setups. Furthermore, this thesis introduces two quantitative approaches, namely a decharging algorithm based on integer linear programming and a new workflow for identification of differentially expressed proteins for a large in vitro study on toxic compounds. Decharging infers the uncharged mass of a peptide (or protein) by clustering all its charge variants. The latter occur frequently under certain experimental conditions. We employ simulation to show that decharging is robust against missing values even for high complexity data and that the algorithm outperforms other solutions in terms of mass accuracy and run time on real data. The last part of this thesis deals with a new state-of-the-art workflow for protein quantification based on isobaric tags for relative and absolute quantitation (iTRAQ). We devise a new approach to isotope correction, propose an experimental design, introduce new metrics of iTRAQ data quality, and confirm putative properties of iTRAQ data using a novel approach. All tools developed as part of this thesis are implemented in OpenMS, a C++ library for computational mass spectrometry

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

Computational Analysis of Mass Spectrometric Data for Whole Organism Proteomic Studies

Author: Razumovskaya Evgenia
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/05/2006
Field of study

In the last decades great breakthroughs have been achieved in the study of the genomes, supplying us with the vast knowledge of the genes and a large number of sequenced organisms. With the availability of genome information, the new systematic studies have arisen. One of the most prominent areas is proteomics. Proteomics is a discipline devoted to the study of the organism’s expressed protein content. Proteomics studies are concerned with a wide range of problems. Some of the major proteomics focuses upon the studies of protein expression patterns, the detection of protein-protein interactions, protein quantitation, protein localization analysis, and characterization of post-translational modifications. The emergence of proteomics shows great promise to furthering our understanding of the cellular processes and mechanisms of life. One of the main techniques used for high-throughput proteomic studies is mass spectrometry. Capable of detecting masses of biological compounds in complex mixtures, it is currently one of the most powerful methods for protein characterization. New horizons are opening with the new developments of mass spectrometry instrumentation, which can now be applied to a variety of proteomic problems. One of the most popular applications of proteomics involves whole organism high-throughput experiments. However, as new instrumentation is being developed, followed by the design of new experiments, we find ourselves needing new computational algorithms to interpret the results of the experiments. As the thresholds of the current technology are being probed, the new algorithmic designs are beginning to emerge to meet the challenges of the mass spectrometry data evaluation and interpretation. This dissertation is devoted to computational analysis of mass spectrometric data, involving a combination of different topics and techniques to improve our understanding of biological processes using high-throughput whole organism proteomic studies. It consists of the development of new algorithms to improve the data interpretation of the current tools, introducing a new algorithmic approach for post-translational modification detection, and the characterization of a set of computational simulations for biological agent detection in a complex organism background. These studies are designed to further the capabilities of understanding the results of high-throughput mass spectrometric experiments and their impact in the field of proteomics

University of Tennessee, Knoxville: Trace

MALDI-TOF/TOF de novo sequence analysis of 2-D PAGE-separated proteins from Halorhodospira halophila, a bacterium with unsequenced genome

Author: Debyser Griet
Devreese Bart
Memmi Samy
Samyn Bart
Sergeant Kjell
Van Beeumen Jozef
Publication venue: WILEY-V C H VERLAG GMBH
Publication date: 01/01/2006
Field of study

Ghent University Academic Bibliography

Optimized GeLC-MS/MS for Bottom-Up Proteomics

Author: Wielsch Natalie
Publication venue: Technische Universität Dresden
Publication date: 14/05/2009
Field of study

Despite tremendous advances in mass spectrometry instrumentation and mass spectrometry-based methodologies, global protein profiling of organellar, cellular, tissue and body fluid proteomes in different organisms remains a challenging task due to the complexity of the samples and the wide dynamic range of protein concentrations. In addition, large amounts of produced data make result exploitation difficult. To overcome these issues, further advances in sample preparation, mass spectrometry instrumentation as well as data processing and data analysis are required. The presented study focuses as first on the improvement of the proteolytic digestion of proteins in in-gel based proteomic approach (Gel-LCMS). To this end commonly used bovine trypsin (BT) was modified with oligosaccharides in order to overcome its main disadvantages, such as weak thermostability and fast autolysis at basic pH. Glycosylated trypsin derivates maintained their cleavage specifity and showed better thermostability, autolysis resistance and less autolytic background than unmodified BT. In line with the “accelerated digestion protocol” (ADP) previously established in our laboratory modified enzymes were tested in in-gel digestion of proteins. Kinetics of in-gel digestion was studied by MALDI TOF mass spectrometry using 18O-labeled peptides as internal standards as well as by label-free quantification approach, which utilizes intensities of peptide ions detected by nanoLC-MS/MS. In the performed kinetic study the effect of temperature, enzyme concentration and digestion time on the yield of digestion products was characterized. The obtained results showed that in-gel digestion of proteins by glycosylated trypsin conjugates was less efficient compared to the conventional digestion (CD) and achieved maximal 50 to 70% of CD yield, suggesting that the attached sugar molecules limit free diffusion of the modified trypsins into the polyacrylamide gel pores. Nevertheless, these thermostable and autolysis resistant enzymes can be regarded as promising candidates for gel-free shotgun approach. To address the reliability issue of proteomic data I further focused on protein identifications with borderline statistical confidence produced by database searching. These hits are typically produced by matching a few marginal quality MS/MS spectra to database peptide sequences and represent a significant bottleneck in proteomics. A method was developed for rapid validation of borderline hits, which takes advantage of the independent interpretation of the acquired tandem mass spectra by de novo sequencing software PepNovo followed by mass-spectrometry driven BLAST (MS BLAST) sequence similarity searching that utilize all partially accurate, degenerate and redundant proposed peptide sequences. It was demonstrated that a combination of MASCOT software, de novo sequencing software PepNovo and MS BLAST, bundled by a simple scripted interface, enabled rapid and efficient validation of a large number of borderline hits, produced by matching of one or two MS/MS spectra with marginal statistical significance

Technische Universität Dresden: Qucosa

High Performance Computing Algorithms for Accelerating Peptide Identification from Mass-Spectrometry Data Using Heterogeneous Supercomputers

Author: Haseeb Muhammad
Saeed Fahad, ed.
Publication venue: FIU Digital Commons
Publication date: 01/01/2023
Field of study

Fast and accurate identification of peptides and proteins from the mass spectrometry (MS) data is a critical problem in modern systems biology. Database peptide search is the most commonly used computational method to identify peptide sequences from the MS data. In this method, giga-bytes of experimentally generated MS data are compared against tera-byte sized databases of theoretically simulated MS data resulting in a compute- and data-intensive problem requiring days or weeks of computational times on desktop machines. Existing serial and high performance computing (HPC) algorithms strive to accelerate and improve the computational efficiency of the search, but exhibit sub-optimal performances due to their inefficient parallelization models, low resource utilization and high overhead costs

DigitalCommons@Florida International University

LC-MSsim – a simulation software for liquid chromatography mass spectrometry data

Abstract Background Mass Spectrometry coupled to Liquid Chromatography (LC-MS) is commonly used to analyze the protein content of biological samples in large scale studies. The data resulting from an LC-MS experiment is huge, highly complex and noisy. Accordingly, it has sparked new developments in Bioinformatics, especially in the fields of algorithm development, statistics and software engineering. In a quantitative label-free mass spectrometry experiment, crucial steps are the detection of peptide features in the mass spectra and the alignment of samples by correcting for shifts in retention time. At the moment, it is difficult to compare the plethora of algorithms for these tasks. So far, curated benchmark data exists only for peptide identification algorithms but no data that represents a ground truth for the evaluation of feature detection, alignment and filtering algorithms. Results We present <it>LC-MSsim</it>, a simulation software for LC-ESI-MS experiments. It simulates ESI spectra on the MS level. It reads a list of proteins from a FASTA file and digests the protein mixture using a user-defined enzyme. The software creates an LC-MS data set using a predictor for the retention time of the peptides and a model for peak shapes and elution profiles of the mass spectral peaks. Our software also offers the possibility to add contaminants, to change the background noise level and includes a model for the detectability of peptides in mass spectra. After the simulation, <it>LC-MSsim </it>writes the simulated data to mzData, a public XML format. The software also stores the positions (monoisotopic m/z and retention time) and ion counts of the simulated ions in separate files. Conclusion <it>LC-MSsim </it>generates simulated LC-MS data sets and incorporates models for peak shapes and contaminations. Algorithm developers can match the results of feature detection and alignment algorithms against the simulated ion lists and meaningful error rates can be computed. We anticipate that <it>LC-MSsim </it>will be useful to the wider community to perform benchmark studies and comparisons between computational tools.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

PubMed Central

Methods for peptide identification by spectral comparison

Author: Beaudrie Christian EH
Bell Alexander W
Bergeron John JM
Carrillo Brian
Kearney Robert E
Liu Jian
Yanofsky Corey M
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: Tandem mass spectrometry followed by database search is currently the predominant technology for peptide sequencing in shotgun proteomics experiments. Most methods compare experimentally observed spectra to the theoretical spectra predicted from the sequences in protein databases. There is a growing interest, however, in comparing unknown experimental spectra to a library of previously identified spectra. This approach has the advantage of taking into account instrument-dependent factors and peptide-specific differences in fragmentation probabilities. It is also computationally more efficient for high-throughput proteomics studies. RESULTS: This paper investigates computational issues related to this spectral comparison approach. Different methods have been empirically evaluated over several large sets of spectra. First, we illustrate that the peak intensities follow a Poisson distribution. This implies that applying a square root transform will optimally stabilize the peak intensity variance. Our results show that the square root did indeed outperform other transforms, resulting in improved accuracy of spectral matching. Second, different measures of spectral similarity were compared, and the results illustrated that the correlation coefficient was most robust. Finally, we examine how to assemble multiple spectra associated with the same peptide to generate a synthetic reference spectrum. Ensemble averaging is shown to provide the best combination of accuracy and efficiency. CONCLUSION: Our results demonstrate that when combined, these methods can boost the sensitivity and specificity of spectral comparison. Therefore they are capable of enhancing and complementing existing tools for consistent and accurate peptide identification

University of Toronto Research Repository

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central