1 research outputs found

    Sequence-Based Prediction of Cysteine Reactivity Using Machine Learning

    No full text
    As one of the most intrinsically reactive amino acids, cysteine carries a variety of important biochemical functions, including catalysis and redox regulation. Discovery and characterization of cysteines with heightened reactivity will help annotate protein functions. Chemical proteomic methods have been used to quantitatively profile cysteine reactivity in native proteomes, showing a strong correlation between the chemical reactivity of a cysteine and its functionality; however, the relationship between the cysteine reactivity and its local sequence has not yet been systematically explored. Herein, we report a machine learning method, sbPCR (sequence-based prediction of cysteine reactivity), which combines the basic local alignment search tool, truncated composition of <i>k</i>-spaced amino acid pair analysis, and support vector machine to predict cysteines with hyper-reactivity based on only local sequence features. Using a benchmark set compiled from hyper-reactive cysteines in human proteomes, our method can achieve a prediction accuracy of 98%, a precision of 95%, and a recall ratio of 89%. We utilized these governing features of local sequence motifs to expand the prediction to potential hyper-reactive cysteines in other proteomes deposited in the UniProt database. We validated our predictions in <i>Escherichia coli</i> by activity-based protein profiling and discovered a hyper-reactive cysteine from a functionally uncharacterized protein, YecH. Biochemical analysis suggests that the hyper-reactive cysteine might be involved in metal binding. Our computational method provides a large inventory of potential hyper-reactive cysteines in proteomes and is highly complementary to other experimental approaches to guide systematic annotation of protein functions in the postgenome era
    corecore