1 research outputs found
Sequence-Based Prediction of Cysteine Reactivity Using Machine Learning
As one of the most
intrinsically reactive amino acids, cysteine
carries a variety of important biochemical functions, including catalysis
and redox regulation. Discovery and characterization of cysteines
with heightened reactivity will help annotate protein functions. Chemical
proteomic methods have been used to quantitatively profile cysteine
reactivity in native proteomes, showing a strong correlation between
the chemical reactivity of a cysteine and its functionality; however,
the relationship between the cysteine reactivity and its local sequence
has not yet been systematically explored. Herein, we report a machine
learning method, sbPCR (sequence-based prediction of cysteine reactivity),
which combines the basic local alignment search tool, truncated composition
of <i>k</i>-spaced amino acid pair analysis, and support
vector machine to predict cysteines with hyper-reactivity based on
only local sequence features. Using a benchmark set compiled from
hyper-reactive cysteines in human proteomes, our method can achieve
a prediction accuracy of 98%, a precision of 95%, and a recall ratio
of 89%. We utilized these governing features of local sequence motifs
to expand the prediction to potential hyper-reactive cysteines in
other proteomes deposited in the UniProt database. We validated our
predictions in <i>Escherichia coli</i> by activity-based
protein profiling and discovered a hyper-reactive cysteine from a
functionally uncharacterized protein, YecH. Biochemical analysis suggests
that the hyper-reactive cysteine might be involved in metal binding.
Our computational method provides a large inventory of potential hyper-reactive
cysteines in proteomes and is highly complementary to other experimental
approaches to guide systematic annotation of protein functions
in the postgenome era