8 research outputs found

    Protein asparagine deamidation prediction based on structures with machine learning methods

    No full text
    <div><p>Chemical stability is a major concern in the development of protein therapeutics due to its impact on both efficacy and safety. Protein “hotspots” are amino acid residues that are subject to various chemical modifications, including deamidation, isomerization, glycosylation, oxidation etc. A more accurate prediction method for potential hotspot residues would allow their elimination or reduction as early as possible in the drug discovery process. In this work, we focus on prediction models for asparagine (Asn) deamidation. Sequence-based prediction method simply identifies the NG motif (amino acid asparagine followed by a glycine) to be liable to deamidation. It still dominates deamidation evaluation process in most pharmaceutical setup due to its convenience. However, the simple sequence-based method is less accurate and often causes over-engineering a protein. We introduce structure-based prediction models by mining available experimental and structural data of deamidated proteins. Our training set contains 194 Asn residues from 25 proteins that all have available high-resolution crystal structures. Experimentally measured deamidation half-life of Asn in penta-peptides as well as 3D structure-based properties, such as solvent exposure, crystallographic B-factors, local secondary structure and dihedral angles etc., were used to train prediction models with several machine learning algorithms. The prediction tools were cross-validated as well as tested with an external test data set. The random forest model had high enrichment in ranking deamidated residues higher than non-deamidated residues while effectively eliminated false positive predictions. It is possible that such quantitative protein structure–function relationship tools can also be applied to other protein hotspot predictions. In addition, we extensively discussed metrics being used to evaluate the performance of predicting unbalanced data sets such as the deamidation case.</p></div

    Performance comparison between NG-motif, NGOME and our structure-based methods.

    No full text
    <p>NG-motif and NGOME prediction performances were represented by plotting the TPR v.s. FPR points (red triangle for NG-motif and blue square for NGOME) on the ROC of the RF method (purple line).</p

    Cross validation of binary deamidation prediction models.

    No full text
    <p>Cross validation of binary deamidation prediction models.</p

    Different types of descriptors that were developed for building deamidation prediction models.

    No full text
    <p>Different types of descriptors that were developed for building deamidation prediction models.</p

    Comparison between NG-motif, NGOME, and our structure-based prediction methods.

    No full text
    <p>Comparison between NG-motif, NGOME, and our structure-based prediction methods.</p

    Features (descriptors) ranking by RFE (the RF model).

    No full text
    <p>Features (descriptors) ranking by RFE (the RF model).</p

    Blind test of binary deamidation prediction models.

    No full text
    <p>Blind test of binary deamidation prediction models.</p

    De Novo Prediction of P‑Glycoprotein-Mediated Efflux Liability for Druglike Compounds

    No full text
    P-glycoprotein (Pgp) is capable of recognizing and transporting a wide range of chemically diverse compounds in vivo. Overcoming Pgp-mediated efflux can represent a significant challenge when penetration into the central nervous system is required or within the context of developing anticancer therapies. While numerous in silico models have been developed to predict Pgp-mediated efflux, these models rely on training sets and are best suited to make interpolations. Therefore, it is desirable to develop ab initio models that can be used to predict efflux liabilities. Herein, we present a de novo method that can be used to predict Pgp-mediated efflux potential for druglike compounds. A model, which correlates the computed solvation free energy differences obtained in water and chloroform with Pgp-mediated efflux (in logarithmic scale), was successful in predicting Pgp efflux ratios for a wide range of chemically diverse compounds with a R<sup>2</sup> and root-mean-square error of 0.65 and 0.29, respectively
    corecore