38 research outputs found

    Generalized calibration across liquid chromatography setups for generic prediction of small-molecule retention times

    Get PDF
    Accurate prediction of liquid chromatographic retention times from small-molecule structures is useful for reducing experimental measurements and for improved identification in targeted and untargeted MS. However, different experimental setups (e.g., differences in columns, gradients, solvents, or stationary phase) have given rise to a multitude of prediction models that only predict accurate retention times for a specific experimental setup. In practice this typically results in the fitting of a new predictive model for each specific type of setup, which is not only inefficient but also requires substantial prior data to be accumulated on each such setup. Here we introduce the concept of generalized calibration, which is capable of the straightforward mapping of retention time models between different experimental setups. This concept builds on the database-controlled calibration approach implemented in PredRet and fits calibration curves on predicted retention times instead of only on observed retention times. We show that this approach results in substantially higher accuracy of elution-peak prediction than is achieved by setup-specific models

    Accurate peptide fragmentation predictions allow data driven approaches to replace and improve upon proteomics search engine scoring functions

    Get PDF
    Motivation: The use of post-processing tools to maximize the information gained from a proteomics search engine is widely accepted and used by the community, with the most notable example being Percolator-a semi-supervised machine learning model which learns a new scoring function for a given dataset. The usage of such tools is however bound to the search engine's scoring scheme, which doesn't always make full use of the intensity information present in a spectrum. We aim to show how this tool can be applied in such a way that maximizes the use of spectrum intensity information by leveraging another machine learning-based tool, MS2PIP. MS2PIP predicts fragment ion peak intensities. Results: We show how comparing predicted intensities to annotated experimental spectra by calculating direct similarity metrics provides enough information for a tool such as Percolator to accurately separate two classes of peptide-to-spectrum matches. This approach allows using more information out of the data (compared with simpler intensity based metrics, like peak counting or explained intensities summing) while maintaining control of statistics such as the false discovery rate

    Structure Alignment

    Full text link
    While many good textbooks are available on Protein Structure, Molecular Simulations, Thermodynamics and Bioinformatics methods in general, there is no good introductory level book for the field of Structural Bioinformatics. This book aims to give an introduction into Structural Bioinformatics, which is where the previous topics meet to explore three dimensional protein structures through computational analysis. We provide an overview of existing computational techniques, to validate, simulate, predict and analyse protein structures. More importantly, it will aim to provide practical knowledge about how and when to use such techniques. We will consider proteins from three major vantage points: Protein structure quantification, Protein structure prediction, and Protein simulation & dynamics. The Protein DataBank (PDB) contains a wealth of structural information. In order to investigate the similarity between different proteins in this database, one can compare the primary sequence through pairwise alignment and calculate the sequence identity (or similarity) over the two sequences. This strategy will work particularly well if the proteins you want to compare are close homologs. However, in this chapter we will explain that a structural comparison through structural alignment will give you much more valuable information, that allows you to investigate similarities between proteins that cannot be discovered by comparing the sequences alone.Comment: editorial responsability: K. Anton Feenstra, Sanne Abeln. This chapter is part of the book "Introduction to Protein Structural Bioinformatics". The Preface arXiv:1801.09442 contains links to all the (published) chapters. The update adds available arxiv hyperlinks for the chapter

    Structure Alignment

    Get PDF
    While many good textbooks are available on Protein Structure, Molecular Simulations, Thermodynamics and Bioinformatics methods in general, there is no good introductory level book for the field of Structural Bioinformatics. This book aims to give an introduction into Structural Bioinformatics, which is where the previous topics meet to explore three dimensional protein structures through computational analysis. We provide an overview of existing computational techniques, to validate, simulate, predict and analyse protein structures. More importantly, it will aim to provide practical knowledge about how and when to use such techniques. We will consider proteins from three major vantage points: Protein structure quantification, Protein structure prediction, and Protein simulation & dynamics. The Protein DataBank (PDB) contains a wealth of structural information. In order to investigate the similarity between different proteins in this database, one can compare the primary sequence through pairwise alignment and calculate the sequence identity (or similarity) over the two sequences. This strategy will work particularly well if the proteins you want to compare are close homologs. However, in this chapter we will explain that a structural comparison through structural alignment will give you much more valuable information, that allows you to investigate similarities between proteins that cannot be discovered by comparing the sequences alone

    A comparison of collision cross section values obtained via travelling wave ion mobility-mass spectrometry and ultra high performance liquid chromatography-ion mobility-mass spectrometry : application to the characterisation of metabolites in rat urine

    Get PDF
    A comprehensive Collision Cross Section (CCS) library was obtained via Travelling Wave Ion Guide mobility measurements through direct infusion (DI). The library consists of CCS and Mass Spectral (MS) data in negative and positive ElectroSpray Ionisation (ESI) mode for 463 and 479 endogenous metabolites, respectively. For both ionisation modes combined, TWCCSN2 data were obtained for 542 non-redundant metabolites. These data were acquired on two different ion mobility enabled orthogonal acceleration QToF MS systems in two different laboratories, with the majority of the resulting TWCCSN2 values (from detected compounds) found to be within 1% of one another. Validation of these results against two independent, external TWCCSN2 data sources and predicted TWCCSN2 values indicated to be within 1-2% of these other values. The same metabolites were then analysed using a rapid reversed-phase ultra (high) performance liquid chromatographic (U(H)PLC) separation combined with IM and MS (IM-MS) thus providing retention time (tr), m/z and TWCCSN2 values (with the latter compared with the DI-IM-MS data). Analytes for which TWCCSN2 values were obtained by U(H)PLC-IM-MS showed good agreement with the results obtained from DI-IM-MS. The repeatability of the TWCCSN2 values obtained for these metabolites on the different ion mobility QToF systems, using either DI or LC, encouraged the further evaluation of the U(H)PLC-IM-MS approach via the analysis of samples of rat urine, from control and methotrexate-treated animals, in order to assess the potential of the approach for metabolite identification and profiling in metabolic phenotyping studies. Based on the database derived from the standards 63 metabolites were identified in rat urine, using positive ESI, based on the combination of tr, TWCCSN2 and MS data.</p

    Structural Property Prediction

    Full text link
    While many good textbooks are available on Protein Structure, Molecular Simulations, Thermodynamics and Bioinformatics methods in general, there is no good introductory level book for the field of Structural Bioinformatics. This book aims to give an introduction into Structural Bioinformatics, which is where the previous topics meet to explore three dimensional protein structures through computational analysis. We provide an overview of existing computational techniques, to validate, simulate, predict and analyse protein structures. More importantly, it will aim to provide practical knowledge about how and when to use such techniques. We will consider proteins from three major vantage points: Protein structure quantification, Protein structure prediction, and Protein simulation & dynamics. Some structural properties of proteins that are closely linked to their function may be easier (or much faster) to predict from sequence than the complete tertiary structure; for example, secondary structure, surface accessibility, flexibility, disorder, interface regions or hydrophobic patches. Serving as building blocks for the native protein fold, these structural properties also contain important structural and functional information not apparent from the amino acid sequence. Here, we will first give an introduction into the application of machine learning for structural property prediction, and explain the concepts of cross-validation and benchmarking. Next, we will review various methods that incorporate knowledge of these concepts to predict those structural properties, such as secondary structure, surface accessibility, disorder and flexibility, and aggregation.Comment: editorial responsability: Juami H. M. van Gils, K. Anton Feenstra, Sanne Abeln. This chapter is part of the book "Introduction to Protein Structural Bioinformatics". The Preface arXiv:1801.09442 contains links to all the (published) chapter

    The difference is in the details : predicting the LC-IM-MS behaviour of metabolites and peptides

    No full text

    Comprehensive and empirical evaluation of machine learning algorithms for small molecule LC retention time prediction

    No full text
    Liquid chromatography is a core component of almost all mass spectrometric analyses of (bio)molecules. Because of the high-throughput nature of mass spectrometric analyses, the interpretation of these chromatographic data increasingly relies on informatics solutions that attempt to predict an analyte's retention time. The key components of such predictive algorithms are the features these are supplies with, and the actual machine learning algorithm used to fit the model parameters. Therefore, we have evaluated the performance of seven machine learning algorithms on 36 distinct metabolomics data sets, using two distinct feature sets. Interestingly, the results show that no single learning algorithm performs optimally for all data sets, with different types of algorithms achieving top performance for different types of analytes or different protocols. Our results thus show that an evaluation of machine learning algorithms for retention time prediction is needed to find a suitable algorithm for specific analytes or protocols. Importantly, however, our results also show that blending different types of models together decreases the error on outliers, indicating that the combination of several approaches holds substantial promise for the development of more generic, high-performing algorithms