4 research outputs found
Evaluation of Machine Learning Models for Proteoform Retention and Migration Time Prediction in Top-Down Mass Spectrometry
Reversed-phase liquid
chromatography (RPLC) and capillary zone
electrophoresis (CZE) are two primary proteoform separation methods
in mass spectrometry (MS)-based top-down proteomics. Proteoform retention
time (RT) prediction in RPLC and migration time (MT) prediction in
CZE provide additional information for accurate proteoform identification
and quantification. While existing methods are mainly focused on peptide
RT and MT prediction in bottom-up MS, there is still a lack of methods
for proteoform RT and MT prediction in top-down MS. We systematically
evaluated eight machine learning models and a transfer learning method
for proteoform RT prediction and five models and the transfer learning
method for proteoform MT prediction. Experimental results showed that
a gated recurrent unit (GRU)-based model with transfer learning achieved
a high accuracy (R = 0.978) for proteoform RT prediction
and that the GRU-based model and a fully connected neural network
model obtained a high accuracy of R = 0.982 and 0.981
for proteoform MT prediction, respectively
Predicting Electrophoretic Mobility of Proteoforms for Large-Scale Top-Down Proteomics
Large-scale
top-down proteomics characterizes proteoforms in cells
globally with high confidence and high throughput using reversed-phase
liquid chromatography (RPLC)–tandem mass spectrometry (MS/MS)
or capillary zone electrophoresis (CZE)–MS/MS. The false discovery
rate (FDR) from the target–decoy database search is typically
deployed to filter identified proteoforms to ensure high-confidence
identifications (IDs). It has been demonstrated that the FDRs in top-down
proteomics can be drastically underestimated. An alternative approach
to the FDR can be useful for further evaluating the confidence of
proteoform IDs after the database search. We argue that predicting
retention/migration time of proteoforms from the RPLC/CZE separation
accurately and comparing their predicted and experimental separation
time could be a useful and practical approach. Based on our knowledge,
there is still no report in the literature about predicting separation
time of proteoforms using large top-down proteomics data sets. In
this pilot study, for the first time, we evaluated various semiempirical
models for predicting proteoforms’ electrophoretic mobility
(μef) using large-scale top-down proteomics data
sets from CZE–MS/MS. We achieved a linear correlation between
experimental and predicted μef of E. coli proteoforms (R2 = 0.98) with a simple
semiempirical model, which utilizes the number of charges and molecular
mass of each proteoform as the parameters. Our modeling data suggest
that the complete unfolding of proteoforms during CZE separation benefits
the prediction of their μef. Our results also indicate
that N-terminal acetylation and phosphorylation both decrease the
proteoforms’ charge by roughly one charge unit
The “PepSAVI-MS” Pipeline for Natural Product Bioactive Peptide Discovery
The
recent increase in extensively drug-resistant bacterial pathogens
and the associated increase of morbidity and mortality demonstrate
the immediate need for new antibiotic backbones with novel mechanisms
of action. Here, we report the development of the PepSAVI-MS pipeline
for bioactive peptide discovery. This highly versatile platform employs
mass spectrometry and statistics to identify bioactive peptide targets
from complex biological samples. We validate the use of this platform
through the successful identification of known bioactive peptides
from a botanical species, Viola odorata. Using this pipeline, we have widened the known antimicrobial spectrum
for V. odorata cyclotides, including
antibacterial activity of cycloviolacin O2 against A. baumannii. We further demonstrate the broad
applicability of the platform through the identification of novel
anticancer activities for cycloviolacins by their cytotoxicity against
ovarian, breast, and prostate cancer cell lines
