5 research outputs found
A Markov Chain Monte Carlo Method for Estimating the Statistical Significance of Proteoform Identifications by Top-Down Mass Spectrometry
Top-down
mass spectrometry is capable of identifying whole proteoform
sequences with multiple post-translational modifications because it
generates tandem mass spectra directly from intact proteoforms. Many
software tools, such as ProSightPC, MSPathFinder, and TopMG, have
been proposed for identifying proteoforms with modifications. In these
tools, various methods are employed to estimate the statistical significance
of identifications. However, most existing methods are designed for
proteoform identifications without modifications, and the challenge
remains for accurately estimating the statistical significance of
proteoform identifications with modifications. Here we propose TopMCMC,
a method that combines a Markov chain random walk algorithm and a
greedy algorithm for assigning statistical significance to matches
between spectra and protein sequences with variable modifications.
Experimental results showed that TopMCMC achieved high accuracy in
estimating <i>E</i>-values and false discovery rates of
identifications in top-down mass spectrometry. Coupled with TopMG,
TopMCMC identified more spectra than the generating function method
from an MCF-7 top-down mass spectrometry data set
Deep Top-Down Proteomics Using Capillary Zone Electrophoresis-Tandem Mass Spectrometry: Identification of 5700 Proteoforms from the <i>Escherichia coli</i> Proteome
Capillary
zone electrophoresis (CZE)-tandem mass spectrometry (MS/MS)
has been recognized as a useful tool for top-down proteomics. However,
its performance for deep top-down proteomics is still dramatically
lower than widely used reversed-phase liquid chromatography (RPLC)-MS/MS.
We present an orthogonal multidimensional separation platform that
couples size exclusion chromatography (SEC) and RPLC based protein
prefractionation to CZE-MS/MS for deep top-down proteomics of <i>Escherichia coli</i>. The platform generated high peak capacity
(∼4000) for separation of intact proteins, leading to the identification
of 5700 proteoforms from the <i>Escherichia coli</i> proteome.
The data represents a 10-fold improvement in the number of proteoform
identifications compared with previous CZE-MS/MS studies and represents
the largest bacterial top-down proteomics data set reported to date.
The performance of the CZE-MS/MS based platform is comparable to the
state-of-the-art RPLC-MS/MS based systems in terms of the number of
proteoform identifications and the instrument time
Deep Top-Down Proteomics Using Capillary Zone Electrophoresis-Tandem Mass Spectrometry: Identification of 5700 Proteoforms from the <i>Escherichia coli</i> Proteome
Capillary
zone electrophoresis (CZE)-tandem mass spectrometry (MS/MS)
has been recognized as a useful tool for top-down proteomics. However,
its performance for deep top-down proteomics is still dramatically
lower than widely used reversed-phase liquid chromatography (RPLC)-MS/MS.
We present an orthogonal multidimensional separation platform that
couples size exclusion chromatography (SEC) and RPLC based protein
prefractionation to CZE-MS/MS for deep top-down proteomics of <i>Escherichia coli</i>. The platform generated high peak capacity
(∼4000) for separation of intact proteins, leading to the identification
of 5700 proteoforms from the <i>Escherichia coli</i> proteome.
The data represents a 10-fold improvement in the number of proteoform
identifications compared with previous CZE-MS/MS studies and represents
the largest bacterial top-down proteomics data set reported to date.
The performance of the CZE-MS/MS based platform is comparable to the
state-of-the-art RPLC-MS/MS based systems in terms of the number of
proteoform identifications and the instrument time
Predicting Electrophoretic Mobility of Proteoforms for Large-Scale Top-Down Proteomics
Large-scale
top-down proteomics characterizes proteoforms in cells
globally with high confidence and high throughput using reversed-phase
liquid chromatography (RPLC)–tandem mass spectrometry (MS/MS)
or capillary zone electrophoresis (CZE)–MS/MS. The false discovery
rate (FDR) from the target–decoy database search is typically
deployed to filter identified proteoforms to ensure high-confidence
identifications (IDs). It has been demonstrated that the FDRs in top-down
proteomics can be drastically underestimated. An alternative approach
to the FDR can be useful for further evaluating the confidence of
proteoform IDs after the database search. We argue that predicting
retention/migration time of proteoforms from the RPLC/CZE separation
accurately and comparing their predicted and experimental separation
time could be a useful and practical approach. Based on our knowledge,
there is still no report in the literature about predicting separation
time of proteoforms using large top-down proteomics data sets. In
this pilot study, for the first time, we evaluated various semiempirical
models for predicting proteoforms’ electrophoretic mobility
(μef) using large-scale top-down proteomics data
sets from CZE–MS/MS. We achieved a linear correlation between
experimental and predicted μef of E. coli proteoforms (R2 = 0.98) with a simple
semiempirical model, which utilizes the number of charges and molecular
mass of each proteoform as the parameters. Our modeling data suggest
that the complete unfolding of proteoforms during CZE separation benefits
the prediction of their μef. Our results also indicate
that N-terminal acetylation and phosphorylation both decrease the
proteoforms’ charge by roughly one charge unit
