5 research outputs found

    Quantitative analysis of mass spectrometry proteomics data : Software for improved life science

    Get PDF
    The rapid advances in life science, including the sequencing of the human genome and numerous other techiques, has given an extraordinary ability to aquire data on biological systems and human disease. Even so, drug development costs are higher than ever, while the rate of new approved treatments is historically low. A potential explanation to this discrepancy might be the difficulty of understanding the biology underlying the acquired data; the difficulty to refine the data to useful knowledge through interpretation. In this thesis the refinement of the complex data from mass spectrometry proteomics is studied. A number of new algorithms and programs are presented and demonstrated to provide increased analytical ability over previously suggested alternatives. With the higher goal of increasing the mass spectrometry laboratory scientific output, pragmatic studies were also performed, to create new set on compression algorithms for reduced storage requirement of mass spectrometry data, and also to characterize instrument stability. The final components of this thesis are the discussion of the technical and instrumental weaknesses associated with the currently employed mass spectrometry proteomics methodology, and the discussion of current lacking academical software quality and the reasons thereof. As a whole, the primary algorithms, the enabling technology, and the weakness discussions all aim to improve the current capability to perform mass spectrometry proteomics. As this technology is crucial to understand the main functional components of biology, proteins, this quest should allow better and higher quality life science data, and ultimately increase the chances of developing new treatments or diagnostics

    Chapter 6 Understanding and Exploiting Peptide Fragment Ion Intensities Using Experimental and Informatic Approaches*

    No full text
    Abstract Tandem mass spectrometry is a widely used tool in proteomics. This section will address the properties that describe how protonated peptides fragment when activated by collisions in a mass spectrometer and how that information can be used to identify proteins. A review of the mobile proton model is presented, along with a summary of commonly observed peptide cleavage enhancements, including the proline effect. The methods used to elucidate peptide dissociation chemistry by using both small groups of model peptides and large datasets are also discussed. Finally, the role of peak intensity in commercially available and developmental peptide identification algorithms is examined

    Context-sensitive Markov Models for Peptide Scoring and Identification from Tandem Mass Spectrometry

    Get PDF
    Computational methods for peptide identification via tandem mass spectrometry (MS/MS) lie at the heart of proteomic characterization of biological samples. Due to the complex nature of peptide fragmentation process inside mass spectrometers, most extant methods underutilize the intensity information available in the tandem mass spectrum. Further, high noise content and variability in MS/MS datasets present significant data analysis challenges. These factors contribute to loss of identifications, necessitating development of more complex approaches. This dissertation develops and evaluates a novel probabilistic framework called Context-Sensitive Peptide Identification (CSPI) for improving peptide scoring and identification from MS/MS data. Employing Input-Output Hidden Markov Models (IO-HMM), CSPI addresses the above computational challenges by modeling the effect of peptide physicochemical features ("context") on their observed (normalized) MS/MS spectrum intensities. Flexibility and scalability of the CSPI framework enables incorporation of many different kinds of features from the domain into the modeling task. Design choices also include the underlying parameter representation and allow learning complex probability distributions and dependencies embedded in the data. Empirical evaluation on multiple datasets of varying sizes and complexity demonstrates that CSPI's intensity-based scores significantly improve peptide identification performance, identifying up to ~25% more peptides at 1% False Discovery Rate (FDR) as compared with popular state-of-the-art approaches. It is further shown that a weighted score combination procedure that includes CSPI scores along with other commonly used scores leads to greater discrimination between true and false identifications, achieving ~4-8% more correct identifications at 1% FDR compared with the case without CSPI features. Superior performance of the CSPI framework has the potential to impact downstream proteomic investigations (like protein identification, quantification and differential expression) that utilize results from peptide-level analyses. Being computationally intensive, the design and implementation of CSPI supports efficient handling of large MS/MS datasets, achieved through database indexing and parallelization of the computational workflow using multiprocessing architecture

    Microwave-Supported Acid Hydrolysis for Proteomics

    Get PDF
    Our goal is to develop, optimize and demonstrate workflows that incorporate rapid Asp-selective chemical proteolysis into proteomic studies of complex mixtures. This can be further divided into several specific aims. The first aim is to develop and optimize the sample preparation, mass spectrometric, and bioinformatic methods required for complex mixture analysis of peptides resulting from acid digestion both in solution and in polyacrylamide gels. Second, the optimized methods will be applied to three model systems. In the first application, the large peptides derived from microwave-supported acid hydrolysis of human ribosomes isolated from MCF-7 breast cancer cells are analyzed. Secondly, acid hydrolysis will be applied to characterize Lys63 linkages in polyubiquitins. Finally, all the above methods will be combined for the analysis of extracellular vesicles shed by myeloid derived suppressor cells from a murine mammary carcinoma model. After optimizing the mass spectrometric and bioinformatic methods required for analysis of peptides resulting from acid hydrolysis, the most comprehensive analysis using this digestion technique to date was achieved both for in gel and in solution analysis. In gel digestion resulted in identification of over twelve hundred peptides representing 642 proteins, and in solution digestion via mass biased partitioning allowed identification of over 300 proteins. Mass biased partitioning also resulted in two distinct peptide populations from the high and low mass analyses implemented. Nearly 90% of the predicted human ribosomal proteins were identified after acid hydrolysis. High resolution analysis of both precursor and product ions resulted in an average sequence coverage of 46% among identified proteins. It was also demonstrated that microwave-supported acid hydrolysis facilitates a more informative method for analysis of Lys63 linked polyubiquitin. After acid hydrolysis, ~629 Da mass shifts were found to be indicative of isopeptides. These isopeptides were easily identified from complex mixtures using tandem mass spectrometry and diagnostic b ions. Extracellular vesicles from a murine carcinoma model were then analyzed using in gel microwave-supported acid hydrolysis, mass biased partitioning after in solution digestion, and the sample was interrogated for the presence of ubiquitinated peptides
    corecore