4 research outputs found

    Evaluation of Machine Learning Models for Proteoform Retention and Migration Time Prediction in Top-Down Mass Spectrometry

    No full text
    Reversed-phase liquid chromatography (RPLC) and capillary zone electrophoresis (CZE) are two primary proteoform separation methods in mass spectrometry (MS)-based top-down proteomics. Proteoform retention time (RT) prediction in RPLC and migration time (MT) prediction in CZE provide additional information for accurate proteoform identification and quantification. While existing methods are mainly focused on peptide RT and MT prediction in bottom-up MS, there is still a lack of methods for proteoform RT and MT prediction in top-down MS. We systematically evaluated eight machine learning models and a transfer learning method for proteoform RT prediction and five models and the transfer learning method for proteoform MT prediction. Experimental results showed that a gated recurrent unit (GRU)-based model with transfer learning achieved a high accuracy (R = 0.978) for proteoform RT prediction and that the GRU-based model and a fully connected neural network model obtained a high accuracy of R = 0.982 and 0.981 for proteoform MT prediction, respectively

    Predicting Electrophoretic Mobility of Proteoforms for Large-Scale Top-Down Proteomics

    No full text
    Large-scale top-down proteomics characterizes proteoforms in cells globally with high confidence and high throughput using reversed-phase liquid chromatography (RPLC)–tandem mass spectrometry (MS/MS) or capillary zone electrophoresis (CZE)–MS/MS. The false discovery rate (FDR) from the target–decoy database search is typically deployed to filter identified proteoforms to ensure high-confidence identifications (IDs). It has been demonstrated that the FDRs in top-down proteomics can be drastically underestimated. An alternative approach to the FDR can be useful for further evaluating the confidence of proteoform IDs after the database search. We argue that predicting retention/migration time of proteoforms from the RPLC/CZE separation accurately and comparing their predicted and experimental separation time could be a useful and practical approach. Based on our knowledge, there is still no report in the literature about predicting separation time of proteoforms using large top-down proteomics data sets. In this pilot study, for the first time, we evaluated various semiempirical models for predicting proteoforms’ electrophoretic mobility (μef) using large-scale top-down proteomics data sets from CZE–MS/MS. We achieved a linear correlation between experimental and predicted μef of E. coli proteoforms (R2 = 0.98) with a simple semiempirical model, which utilizes the number of charges and molecular mass of each proteoform as the parameters. Our modeling data suggest that the complete unfolding of proteoforms during CZE separation benefits the prediction of their μef. Our results also indicate that N-terminal acetylation and phosphorylation both decrease the proteoforms’ charge by roughly one charge unit

    The “PepSAVI-MS” Pipeline for Natural Product Bioactive Peptide Discovery

    No full text
    The recent increase in extensively drug-resistant bacterial pathogens and the associated increase of morbidity and mortality demonstrate the immediate need for new antibiotic backbones with novel mechanisms of action. Here, we report the development of the PepSAVI-MS pipeline for bioactive peptide discovery. This highly versatile platform employs mass spectrometry and statistics to identify bioactive peptide targets from complex biological samples. We validate the use of this platform through the successful identification of known bioactive peptides from a botanical species, Viola odorata. Using this pipeline, we have widened the known antimicrobial spectrum for V. odorata cyclotides, including antibacterial activity of cycloviolacin O2 against A. baumannii. We further demonstrate the broad applicability of the platform through the identification of novel anticancer activities for cycloviolacins by their cytotoxicity against ovarian, breast, and prostate cancer cell lines
    corecore