3,855 research outputs found

    Statistical learning of peptide retention behavior in chromatographic separations: a new kernel-based approach for computational proteomics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-throughput peptide and protein identification technologies have benefited tremendously from strategies based on tandem mass spectrometry (MS/MS) in combination with database searching algorithms. A major problem with existing methods lies within the significant number of false positive and false negative annotations. So far, standard algorithms for protein identification do not use the information gained from separation processes usually involved in peptide analysis, such as retention time information, which are readily available from chromatographic separation of the sample. Identification can thus be improved by comparing measured retention times to predicted retention times. Current prediction models are derived from a set of measured test analytes but they usually require large amounts of training data.</p> <p>Results</p> <p>We introduce a new kernel function which can be applied in combination with support vector machines to a wide range of computational proteomics problems. We show the performance of this new approach by applying it to the prediction of peptide adsorption/elution behavior in strong anion-exchange solid-phase extraction (SAX-SPE) and ion-pair reversed-phase high-performance liquid chromatography (IP-RP-HPLC). Furthermore, the predicted retention times are used to improve spectrum identifications by a <it>p</it>-value-based filtering approach. The approach was tested on a number of different datasets and shows excellent performance while requiring only very small training sets (about 40 peptides instead of thousands). Using the retention time predictor in our retention time filter improves the fraction of correctly identified peptide mass spectra significantly.</p> <p>Conclusion</p> <p>The proposed kernel function is well-suited for the prediction of chromatographic separation in computational proteomics and requires only a limited amount of training data. The performance of this new method is demonstrated by applying it to peptide retention time prediction in IP-RP-HPLC and prediction of peptide sample fractionation in SAX-SPE. Finally, we incorporate the predicted chromatographic behavior in a <it>p</it>-value based filter to improve peptide identifications based on liquid chromatography-tandem mass spectrometry.</p

    A robust linear regression based algorithm for automated evaluation of peptide identifications from shotgun proteomics by use of reversed-phase liquid chromatography retention time

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Rejection of false positive peptide matches in database searches of shotgun proteomic experimental data is highly desirable. Several methods have been developed to use the peptide retention time as to refine and improve peptide identifications from database search algorithms. This report describes the implementation of an automated approach to reduce false positives and validate peptide matches.</p> <p>Results</p> <p>A robust linear regression based algorithm was developed to automate the evaluation of peptide identifications obtained from shotgun proteomic experiments. The algorithm scores peptides based on their predicted and observed reversed-phase liquid chromatography retention times. The robust algorithm does not require internal or external peptide standards to train or calibrate the linear regression model used for peptide retention time prediction. The algorithm is generic and can be incorporated into any database search program to perform automated evaluation of the candidate peptide matches based on their retention times. It provides a statistical score for each peptide match based on its retention time.</p> <p>Conclusion</p> <p>Analysis of peptide matches where the retention time score was included resulted in a significant reduction of false positive matches with little effect on the number of true positives. Overall higher sensitivities and specificities were achieved for database searches carried out with MassMatrix, Mascot and X!Tandem after implementation of the retention time based score algorithm.</p

    LC-MSsim – a simulation software for liquid chromatography mass spectrometry data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mass Spectrometry coupled to Liquid Chromatography (LC-MS) is commonly used to analyze the protein content of biological samples in large scale studies. The data resulting from an LC-MS experiment is huge, highly complex and noisy. Accordingly, it has sparked new developments in Bioinformatics, especially in the fields of algorithm development, statistics and software engineering. In a quantitative label-free mass spectrometry experiment, crucial steps are the detection of peptide features in the mass spectra and the alignment of samples by correcting for shifts in retention time. At the moment, it is difficult to compare the plethora of algorithms for these tasks. So far, curated benchmark data exists only for peptide identification algorithms but no data that represents a ground truth for the evaluation of feature detection, alignment and filtering algorithms.</p> <p>Results</p> <p>We present <it>LC-MSsim</it>, a simulation software for LC-ESI-MS experiments. It simulates ESI spectra on the MS level. It reads a list of proteins from a FASTA file and digests the protein mixture using a user-defined enzyme. The software creates an LC-MS data set using a predictor for the retention time of the peptides and a model for peak shapes and elution profiles of the mass spectral peaks. Our software also offers the possibility to add contaminants, to change the background noise level and includes a model for the detectability of peptides in mass spectra. After the simulation, <it>LC-MSsim </it>writes the simulated data to mzData, a public XML format. The software also stores the positions (monoisotopic m/z and retention time) and ion counts of the simulated ions in separate files.</p> <p>Conclusion</p> <p><it>LC-MSsim </it>generates simulated LC-MS data sets and incorporates models for peak shapes and contaminations. Algorithm developers can match the results of feature detection and alignment algorithms against the simulated ion lists and meaningful error rates can be computed. We anticipate that <it>LC-MSsim </it>will be useful to the wider community to perform benchmark studies and comparisons between computational tools.</p

    Advancing myxobacterial natural product discovery by combining genome and metabolome mining with organic synthesis

    Get PDF
    Myxobacteria represent a viable source for natural products with a broad variety of chemical scaffolds and intriguing biological activities. This thesis covers different contemporary ways to approach myxobacterial secondary metabolism. The ribosomal peptide myxarylin was discovered through a genome-guided approach. This study describes the discovery, semi-synthesis-assisted isolation, structure elucidation and heterologous production. Furthermore, statistics-based metabolome mining revealed a family of light-sensitive compounds with yet elusive structures. A biosynthetic gene cluster putatively encoding the biosynthetic machinery, could be identified by cluster inactivation experiments. Metabolome mining additionally revealed new myxochelin congeners featuring a rare nicotinic acid moiety. Total synthesis was applied to confirm structures, elucidate the absolute stereochemistry and to generate additional non-natural derivatives. Finally, total synthesis was used to create a small library of sandacrabins, a family of terpenoid-alkaloids that feature promising antiviral activities, with the aim to develop improved congeners with increased target activity and reduced cytotoxicity. The combination of up-to-date approaches in natural products discovery, especially focusing on UHPLC-hrMS workflows, and small-scale organic synthesis was successfully applied to facilitate compound isolation, confirm structures and to create novel congeners of myxobacterial natural products.Myxobakterien sind eine reichhaltige Quelle für neue Naturstoffe mit vielfältigen chemischen Grundgerüsten und faszinierenden biologischen Aktivitäten. Diese Arbeit behandelt verschiedene aktuelle Methoden, den Sekundärstoffwechsel von Myxobakterien zu erschließen. Das ribosomale Peptid Myxarylin wurde mithilfe eines genomgeleiteten Ansatzes entdeckt. Beschrieben wird außerdem die semisynthesegestützte Isolierung, Strukturaufklärung und heterologe Produktion. Mit statistischer Metabolomanalyse wurde eine Familie lichtinstabiler Verbindungen mit bislang unbekannten Strukturen entdeckt. Über Inaktivierungsexperimente konnte ein Gencluster identifiziert werden, das vermutlich die Biosynthesemaschinerie dieser Naturstoffe kodiert. Weiterhin wurden neue Myxochelin-Derivate entdeckt, die sich durch den Einbau von Nikotinsäure auszeichnen. Mittels Totalsynthese konnten die Strukturen inklusive Stereochemie aufgeklärt und weitere Derivate hergestellt werden. Zuletzt wurden neue Derivate der Sandacrabine synthetisiert, eine Familie von Terpenoid-Alkaloiden mit vielversprechender antiviraler Aktivität. Das Ziel dabei ist es, die gewünschte Aktivität zu erhöhen und die Zytotoxizität zu verringern. Im Rahmen dieser Arbeit wurden erfolgreich moderne Ansätze in der Naturstoffforschung, insbesondere UHPLC-hrMS-basierte Methoden, mit organischer Synthese kombiniert, um die Isolierung zu erleichtern, Strukturen zu bestätigen und neue Derivate myxobakterieller Naturstoffe herzustellen

    Quantification and Simulation of Liquid Chromatography-Mass Spectrometry Data

    Get PDF
    Computational mass spectrometry is a fast evolving field that has attracted increased attention over the last couple of years. The performance of software solutions determines the success of analysis to a great extent. New algorithms are required to reflect new experimental procedures and deal with new instrument generations. One essential component of algorithm development is the validation (as well as comparison) of software on a broad range of data sets. This requires a gold standard (or so-called ground truth), which is usually obtained by manual annotation of a real data set. Comprehensive manually annotated public data sets for mass spectrometry data are labor-intensive to produce and their quality strongly depends on the skill of the human expert. Some parts of the data may even be impossible to annotate due to high levels of noise or other ambiguities. Furthermore, manually annotated data is usually not available for all steps in a typical computational analysis pipeline. We thus developed the most comprehensive simulation software to date, which allows to generate multiple levels of ground truth and features a plethora of settings to reflect experimental conditions and instrument settings. The simulator is used to generate several distinct types of data. The data are subsequently employed to evaluate existing algorithms. Additionally, we employ simulation to determine the influence of instrument attributes and sample complexity on the ability of algorithms to recover information. The results give valuable hints on how to optimize experimental setups. Furthermore, this thesis introduces two quantitative approaches, namely a decharging algorithm based on integer linear programming and a new workflow for identification of differentially expressed proteins for a large in vitro study on toxic compounds. Decharging infers the uncharged mass of a peptide (or protein) by clustering all its charge variants. The latter occur frequently under certain experimental conditions. We employ simulation to show that decharging is robust against missing values even for high complexity data and that the algorithm outperforms other solutions in terms of mass accuracy and run time on real data. The last part of this thesis deals with a new state-of-the-art workflow for protein quantification based on isobaric tags for relative and absolute quantitation (iTRAQ). We devise a new approach to isotope correction, propose an experimental design, introduce new metrics of iTRAQ data quality, and confirm putative properties of iTRAQ data using a novel approach. All tools developed as part of this thesis are implemented in OpenMS, a C++ library for computational mass spectrometry

    Optimized GeLC-MS/MS for Bottom-Up Proteomics

    Get PDF
    Despite tremendous advances in mass spectrometry instrumentation and mass spectrometry-based methodologies, global protein profiling of organellar, cellular, tissue and body fluid proteomes in different organisms remains a challenging task due to the complexity of the samples and the wide dynamic range of protein concentrations. In addition, large amounts of produced data make result exploitation difficult. To overcome these issues, further advances in sample preparation, mass spectrometry instrumentation as well as data processing and data analysis are required. The presented study focuses as first on the improvement of the proteolytic digestion of proteins in in-gel based proteomic approach (Gel-LCMS). To this end commonly used bovine trypsin (BT) was modified with oligosaccharides in order to overcome its main disadvantages, such as weak thermostability and fast autolysis at basic pH. Glycosylated trypsin derivates maintained their cleavage specifity and showed better thermostability, autolysis resistance and less autolytic background than unmodified BT. In line with the “accelerated digestion protocol” (ADP) previously established in our laboratory modified enzymes were tested in in-gel digestion of proteins. Kinetics of in-gel digestion was studied by MALDI TOF mass spectrometry using 18O-labeled peptides as internal standards as well as by label-free quantification approach, which utilizes intensities of peptide ions detected by nanoLC-MS/MS. In the performed kinetic study the effect of temperature, enzyme concentration and digestion time on the yield of digestion products was characterized. The obtained results showed that in-gel digestion of proteins by glycosylated trypsin conjugates was less efficient compared to the conventional digestion (CD) and achieved maximal 50 to 70% of CD yield, suggesting that the attached sugar molecules limit free diffusion of the modified trypsins into the polyacrylamide gel pores. Nevertheless, these thermostable and autolysis resistant enzymes can be regarded as promising candidates for gel-free shotgun approach. To address the reliability issue of proteomic data I further focused on protein identifications with borderline statistical confidence produced by database searching. These hits are typically produced by matching a few marginal quality MS/MS spectra to database peptide sequences and represent a significant bottleneck in proteomics. A method was developed for rapid validation of borderline hits, which takes advantage of the independent interpretation of the acquired tandem mass spectra by de novo sequencing software PepNovo followed by mass-spectrometry driven BLAST (MS BLAST) sequence similarity searching that utilize all partially accurate, degenerate and redundant proposed peptide sequences. It was demonstrated that a combination of MASCOT software, de novo sequencing software PepNovo and MS BLAST, bundled by a simple scripted interface, enabled rapid and efficient validation of a large number of borderline hits, produced by matching of one or two MS/MS spectra with marginal statistical significance

    Integrating glycomics, proteomics and glycoproteomics to understand the structural basis for influenza a virus evolution and glycan mediated immune interactions

    Get PDF
    Glycosylation modulates the range and specificity of interactions among glycoproteins and their binding partners. This is important in influenza A virus (IAV) biology because binding of host immune molecules depends on glycosylation of viral surface proteins such as hemagglutinin (HA). Circulating viruses mutate rapidly in response to pressure from the host immune system. As proteins mutate, the virus glycosylation patterns change. The consequence is that viruses evolve to evade host immune responses, which renders vaccines ineffective. Glycan biosynthesis is a non-template driven process, governed by stoichiometric and steric relationships between the enzymatic machinery for glycosylation and the protein being glycosylated. Consequently, protein glycosylation is heterogeneous, thereby making structural analysis and elucidation of precise biological functions extremely challenging. The lack of structural information has been a limiting factor in understanding the exact mechanisms of glycan-mediated interactions of the IAV with host immune-lectins. Genetic sequencing methods allow prediction of glycosylation sites along the protein backbone but are unable to provide exact phenotypic information regarding site occupancy. Crystallography methods are also unable to determine the glycan structures beyond the core residues due to the flexible nature of carbohydrates. This dissertation centers on the development of chromatography and mass spectrometry methods for characterization of site-specific glycosylation in complex glycoproteins and application of these methods to IAV glycomics and glycoproteomics. We combined the site-specific glycosylation information generated using mass spectrometry with information from biochemical assays and structural modeling studies to identify key glycosylation sites mediating interactions of HA with immune lectin surfactant protein-D (SP-D). We also identified the structural features that control glycan processing at these sites, particularly those involving glycan maturation from high-mannose to complex-type, which, in turn, regulate interactions with SP-D. The work presented in this dissertation contributes significantly to the improvement of analytical and bioinformatics methods in glycan and glycoprotein analysis using mass spectrometry and greatly advances the understanding of the structural features regulating glycan microheterogeneity on HA and its interactions with host immune lectins

    Metabolomic and genomic investigation of two North-Norwegian cyanobacterial isolates for bioprospecting of new compounds

    Get PDF
    Cyanobacteria are an excellent source of bioactive natural products that can be used in the development of new medicinal drugs. The cyanobacterial genus Nostoc have proven to be prolific producers of molecules with exciting bioactivities, including anti-bacterial, anti-fungal, and anti-cancerous. This feature combined with the Nostoc’s complex life cycle and sizeable genomes make them interesting targets for bioprospecting. During this thesis, a typical natural product discovery pipeline was carried out to investigate some of the genomic and metabolomic characteristics of two Nostoc sp. strains. The pipeline included prediction of the biosynthetic potential, liquid extraction of metabolites using a variation of solvents, reverse-phase fractionation, bioassay to test for bioactivity, and the use of reverse-phase UHPLC-MS² and an informatic tool for dereplication and molecular networking. Some clear differences between the strains were observed in the prediction of metabolite production and in the extracted metabolites. Anti-bacterial activity was observed in extract fractions of both strains, and the dereplication process resulted in the discovery of a congener of the earlier described compound hapalosin which has been shown to reverse multi-drug resistance in cancer cells

    Advancing a systems cell-free metabolic engineering approach to natural product synthesis and discovery

    Get PDF
    Next generation DNA sequencing has led to an accumulation of a putative biosynthetic gene clusters for many natural product classes of interest. In vivo extraction and heterologous expression do not have sufficient throughput to validate predicted enzyme functions and inform future annotations. Further, engineering the production of new natural products is laborious and limited by the trade-offs between cell growth and product synthesis. Conversely, cell-free platforms, particularly those capable of cell-free protein synthesis (CFPS), facilitate rapid screening of enzyme function and prototyping of metabolic pathways. However, the protein content and metabolic activity of many cell-free systems are poorly defined, increasing variability between lysates and impeding systematic engineering. Here, the strength of untargeted peptidomics as an enabling tool for the engineering of cell-free systems is established based upon its ability to measure both global protein abundances and newly synthesized peptides. Synthesis of peptide natural products was found to be more robust in purified enzyme CFPS systems compared to crude lysates; however, non-specific peptide degradation, detected through peptidomics, remains a concern. Crude cell-free systems were determined be better suited to small molecule production, due to the extensive metabolic networks they were found to possess. Perturbations of these networks, carried out through changes to growth media, were observed through shotgun proteomics and informed engineering of phenol biosynthesis in a crude Escherichia coli lysate. Implementing shotgun proteomics as an analytical tool for cell-free systems will increase reproducibility and further the development of a platform for high-throughput functional genomics and metabolic engineering

    Comprehensive Overview of Bottom-up Proteomics using Mass Spectrometry

    Full text link
    Proteomics is the large scale study of protein structure and function from biological systems through protein identification and quantification. "Shotgun proteomics" or "bottom-up proteomics" is the prevailing strategy, in which proteins are hydrolyzed into peptides that are analyzed by mass spectrometry. Proteomics studies can be applied to diverse studies ranging from simple protein identification to studies of proteoforms, protein-protein interactions, protein structural alterations, absolute and relative protein quantification, post-translational modifications, and protein stability. To enable this range of different experiments, there are diverse strategies for proteome analysis. The nuances of how proteomic workflows differ may be challenging to understand for new practitioners. Here, we provide a comprehensive overview of different proteomics methods to aid the novice and experienced researcher. We cover from biochemistry basics and protein extraction to biological interpretation and orthogonal validation. We expect this work to serve as a basic resource for new practitioners in the field of shotgun or bottom-up proteomics
    • …
    corecore