52 research outputs found

    A mass accuracy sensitive probability based scoring algorithm for database searching of tandem mass spectrometry data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) has become one of the most used tools in mass spectrometry based proteomics. Various algorithms have since been developed to automate the process for modern high-throughput LC-MS/MS experiments.</p> <p>Results</p> <p>A probability based statistical scoring model for assessing peptide and protein matches in tandem MS database search was derived. The statistical scores in the model represent the probability that a peptide match is a random occurrence based on the number or the total abundance of matched product ions in the experimental spectrum. The model also calculates probability based scores to assess protein matches. Thus the protein scores in the model reflect the significance of protein matches and can be used to differentiate true from random protein matches.</p> <p>Conclusion</p> <p>The model is sensitive to high mass accuracy and implicitly takes mass accuracy into account during scoring. High mass accuracy will not only reduce false positives, but also improves the scores of true positive matches. The algorithm is incorporated in an automated database search program MassMatrix.</p

    A Dynamic Noise Level Algorithm for Spectral Screening of Peptide MS/MS Spectra

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-throughput shotgun proteomics data contain a significant number of spectra from non-peptide ions or spectra of too poor quality to obtain highly confident peptide identifications. These spectra cannot be identified with any positive peptide matches in some database search programs or are identified with false positives in others. Removing these spectra can improve the database search results and lower computational expense.</p> <p>Results</p> <p>A new algorithm has been developed to filter tandem mass spectra of poor quality from shotgun proteomic experiments. The algorithm determines the noise level dynamically and independently for each spectrum in a tandem mass spectrometric data set. Spectra are filtered based on a minimum number of required signal peaks with a signal-to-noise ratio of 2. The algorithm was tested with 23 sample data sets containing 62,117 total spectra.</p> <p>Conclusions</p> <p>The spectral screening removed 89.0% of the tandem mass spectra that did not yield a peptide match when searched with the MassMatrix database search software. Only 6.0% of tandem mass spectra that yielded peptide matches considered to be true positive matches were lost after spectral screening. The algorithm was found to be very effective at removal of unidentified spectra in other database search programs including Mascot, OMSSA, and X!Tandem (75.93%-91.00%) with a small loss (3.59%-9.40%) of true positive matches.</p

    SAMPI: Protein Identification with Mass Spectra Alignments

    Get PDF
    BACKGROUND: Mass spectrometry based peptide mass fingerprints (PMFs) offer a fast, efficient, and robust method for protein identification. A protein is digested (usually by trypsin) and its mass spectrum is compared to simulated spectra for protein sequences in a database. However, existing tools for analyzing PMFs often suffer from missing or heuristic analysis of the significance of search results and insufficient handling of missing and additional peaks. RESULTS: We present an unified framework for analyzing Peptide Mass Fingerprints that offers a number of advantages over existing methods: First, comparison of mass spectra is based on a scoring function that can be custom-designed for certain applications and explicitly takes missing and additional peaks into account. The method is able to simulate almost every additive scoring scheme. Second, we present an efficient deterministic method for assessing the significance of a protein hit, independent of the underlying scoring function and sequence database. We prove the applicability of our approach using biological mass spectrometry data and compare our results to the standard software Mascot. CONCLUSION: The proposed framework for analyzing Peptide Mass Fingerprints shows performance comparable to Mascot on small peak lists. Introducing more noise peaks, we are able to keep identification rates at a similar level by using the flexibility introduced by scoring schemes

    Identification of alternative splice variants in Aspergillus flavus through comparison of multiple tandem MS search algorithms

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Database searching is the most frequently used approach for automated peptide assignment and protein inference of tandem mass spectra. The results, however, depend on the sequences in target databases and on search algorithms. Recently by using an alternative splicing database, we identified more proteins than with the annotated proteins in <it>Aspergillus flavus</it>. In this study, we aimed at finding a greater number of eligible splice variants based on newly available transcript sequences and the latest genome annotation. The improved database was then used to compare four search algorithms: Mascot, OMSSA, X! Tandem, and InsPecT.</p> <p>Results</p> <p>The updated alternative splicing database predicted 15833 putative protein variants, 61% more than the previous results. There was transcript evidence for 50% of the updated genes compared to the previous 35% coverage. Database searches were conducted using the same set of spectral data, search parameters, and protein database but with different algorithms. The false discovery rates of the peptide-spectrum matches were estimated < 2%. The numbers of the total identified proteins varied from 765 to 867 between algorithms. Whereas 42% (1651/3891) of peptide assignments were unanimous, the comparison showed that 51% (568/1114) of the RefSeq proteins and 15% (11/72) of the putative splice variants were inferred by all algorithms. 12 plausible isoforms were discovered by focusing on the consensus peptides which were detected by at least three different algorithms. The analysis found different conserved domains in two putative isoforms of UDP-galactose 4-epimerase.</p> <p>Conclusions</p> <p>We were able to detect dozens of new peptides using the improved alternative splicing database with the recently updated annotation of the <it>A. flavus </it>genome. Unlike the identifications of the peptides and the RefSeq proteins, large variations existed between the putative splice variants identified by different algorithms. 12 candidates of putative isoforms were reported based on the consensus peptide-spectrum matches. This suggests that applications of multiple search engines effectively reduced the possible false positive results and validated the protein identifications from tandem mass spectra using an alternative splicing database.</p

    OpenMS – An open-source software framework for mass spectrometry

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mass spectrometry is an essential analytical technique for high-throughput analysis in proteomics and metabolomics. The development of new separation techniques, precise mass analyzers and experimental protocols is a very active field of research. This leads to more complex experimental setups yielding ever increasing amounts of data. Consequently, analysis of the data is currently often the bottleneck for experimental studies. Although software tools for many data analysis tasks are available today, they are often hard to combine with each other or not flexible enough to allow for rapid prototyping of a new analysis workflow.</p> <p>Results</p> <p>We present OpenMS, a software framework for rapid application development in mass spectrometry. OpenMS has been designed to be portable, easy-to-use and robust while offering a rich functionality ranging from basic data structures to sophisticated algorithms for data analysis. This has already been demonstrated in several studies.</p> <p>Conclusion</p> <p>OpenMS is available under the Lesser GNU Public License (LGPL) from the project website at <url>http://www.openms.de</url>.</p

    Modular Mass Spectrometric Tool for Analysis of Composition and Phosphorylation of Protein Complexes

    Get PDF
    The combination of high accuracy, sensitivity and speed of single and multiple-stage mass spectrometric analyses enables the collection of comprehensive sets of data containing detailed information about complex biological samples. To achieve these properties, we combined two high-performance matrix-assisted laser desorption ionization mass analyzers in one modular mass spectrometric tool, and applied this tool for dissecting the composition and post-translational modifications of protein complexes. As an example of this approach, we here present studies of the Saccharomyces cerevisiae anaphase-promoting complexes (APC) and elucidation of phosphorylation sites on its components. In general, the modular concept we describe could be useful for assembling mass spectrometers operating with both matrix-assisted laser desorption ionization (MALDI) and electrospray ionization (ESI) ion sources into powerful mass spectrometric tools for the comprehensive analysis of complex biological samples

    The influence of cultivation methods on Shewanella oneidensis physiology and proteome expression

    Get PDF
    High-throughput analyses that are central to microbial systems biology and ecophysiology research benefit from highly homogeneous and physiologically well-defined cell cultures. While attention has focused on the technical variation associated with high-throughput technologies, biological variation introduced as a function of cell cultivation methods has been largely overlooked. This study evaluated the impact of cultivation methods, controlled batch or continuous culture in bioreactors versus shake flasks, on the reproducibility of global proteome measurements in Shewanellaoneidensis MR-1. Variability in dissolved oxygen concentration and consumption rate, metabolite profiles, and proteome was greater in shake flask than controlled batch or chemostat cultures. Proteins indicative of suboxic and anaerobic growth (e.g., fumarate reductase and decaheme c-type cytochromes) were more abundant in cells from shake flasks compared to bioreactor cultures, a finding consistent with data demonstrating that β€œaerobic” flask cultures were O2 deficient due to poor mass transfer kinetics. The work described herein establishes the necessity of controlled cultivation for ensuring highly reproducible and homogenous microbial cultures. By decreasing cell to cell variability, higher quality samples will allow for the interpretive accuracy necessary for drawing conclusions relevant to microbial systems biology research

    Altered Retinoic Acid Metabolism in Diabetic Mouse Kidney Identified by 18O Isotopic Labeling and 2D Mass Spectrometry

    Get PDF
    Numerous metabolic pathways have been implicated in diabetes-induced renal injury, yet few studies have utilized unbiased systems biology approaches for mapping the interconnectivity of diabetes-dysregulated proteins that are involved. We utilized a global, quantitative, differential proteomic approach to identify a novel retinoic acid hub in renal cortical protein networks dysregulated by type 2 diabetes.Total proteins were extracted from renal cortex of control and db/db mice at 20 weeks of age (after 12 weeks of hyperglycemia in the diabetic mice). Following trypsinization, (18)O- and (16)O-labeled control and diabetic peptides, respectively, were pooled and separated by two dimensional liquid chromatography (strong cation exchange creating 60 fractions further separated by nano-HPLC), followed by peptide identification and quantification using mass spectrometry. Proteomic analysis identified 53 proteins with fold change >or=1.5 and p<or=0.05 after Benjamini-Hochberg adjustment (out of 1,806 proteins identified), including alcohol dehydrogenase (ADH) and retinaldehyde dehydrogenase (RALDH1/ALDH1A1). Ingenuity Pathway Analysis identified altered retinoic acid as a key signaling hub that was altered in the diabetic renal cortical proteome. Western blotting and real-time PCR confirmed diabetes-induced upregulation of RALDH1, which was localized by immunofluorescence predominantly to the proximal tubule in the diabetic renal cortex, while PCR confirmed the downregulation of ADH identified with mass spectrometry. Despite increased renal cortical tissue levels of retinol and RALDH1 in db/db versus control mice, all-trans-retinoic acid was significantly decreased in association with a significant decrease in PPARbeta/delta mRNA.Our results indicate that retinoic acid metabolism is significantly dysregulated in diabetic kidneys, and suggest that a shift in all-trans-retinoic acid metabolism is a novel feature in type 2 diabetic renal disease. Our observations provide novel insights into potential links between altered lipid metabolism and other gene networks controlled by retinoic acid in the diabetic kidney, and demonstrate the utility of using systems biology to gain new insights into diabetic nephropathy

    An improved machine learning protocol for the identification of correct Sequest search results

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mass spectrometry has become a standard method by which the proteomic profile of cell or tissue samples is characterized. To fully take advantage of tandem mass spectrometry (MS/MS) techniques in large scale protein characterization studies robust and consistent data analysis procedures are crucial. In this work we present a machine learning based protocol for the identification of correct peptide-spectrum matches from Sequest database search results, improving on previously published protocols.</p> <p>Results</p> <p>The developed model improves on published machine learning classification procedures by 6% as measured by the area under the ROC curve. Further, we show how the developed model can be presented as an interpretable tree of additive rules, thereby effectively removing the 'black-box' notion often associated with machine learning classifiers, allowing for comparison with expert rule-of-thumb. Finally, a method for extending the developed peptide identification protocol to give probabilistic estimates of the presence of a given protein is proposed and tested.</p> <p>Conclusions</p> <p>We demonstrate the construction of a high accuracy classification model for Sequest search results from MS/MS spectra obtained by using the MALDI ionization. The developed model performs well in identifying correct peptide-spectrum matches and is easily extendable to the protein identification problem. The relative ease with which additional experimental parameters can be incorporated into the classification framework, to give additional discriminatory power, allows for future tailoring of the model to take advantage of information from specific instrument set-ups.</p
    • …
    corecore