8 research outputs found
Protein asparagine deamidation prediction based on structures with machine learning methods
<div><p>Chemical stability is a major concern in the development of protein therapeutics due to its impact on both efficacy and safety. Protein “hotspots” are amino acid residues that are subject to various chemical modifications, including deamidation, isomerization, glycosylation, oxidation etc. A more accurate prediction method for potential hotspot residues would allow their elimination or reduction as early as possible in the drug discovery process. In this work, we focus on prediction models for asparagine (Asn) deamidation. Sequence-based prediction method simply identifies the NG motif (amino acid asparagine followed by a glycine) to be liable to deamidation. It still dominates deamidation evaluation process in most pharmaceutical setup due to its convenience. However, the simple sequence-based method is less accurate and often causes over-engineering a protein. We introduce structure-based prediction models by mining available experimental and structural data of deamidated proteins. Our training set contains 194 Asn residues from 25 proteins that all have available high-resolution crystal structures. Experimentally measured deamidation half-life of Asn in penta-peptides as well as 3D structure-based properties, such as solvent exposure, crystallographic B-factors, local secondary structure and dihedral angles etc., were used to train prediction models with several machine learning algorithms. The prediction tools were cross-validated as well as tested with an external test data set. The random forest model had high enrichment in ranking deamidated residues higher than non-deamidated residues while effectively eliminated false positive predictions. It is possible that such quantitative protein structure–function relationship tools can also be applied to other protein hotspot predictions. In addition, we extensively discussed metrics being used to evaluate the performance of predicting unbalanced data sets such as the deamidation case.</p></div
Performance comparison between NG-motif, NGOME and our structure-based methods.
<p>NG-motif and NGOME prediction performances were represented by plotting the TPR v.s. FPR points (red triangle for NG-motif and blue square for NGOME) on the ROC of the RF method (purple line).</p
Cross validation of binary deamidation prediction models.
<p>Cross validation of binary deamidation prediction models.</p
Different types of descriptors that were developed for building deamidation prediction models.
<p>Different types of descriptors that were developed for building deamidation prediction models.</p
Comparison between NG-motif, NGOME, and our structure-based prediction methods.
<p>Comparison between NG-motif, NGOME, and our structure-based prediction methods.</p
Features (descriptors) ranking by RFE (the RF model).
<p>Features (descriptors) ranking by RFE (the RF model).</p
Blind test of binary deamidation prediction models.
<p>Blind test of binary deamidation prediction models.</p
De Novo Prediction of P‑Glycoprotein-Mediated Efflux Liability for Druglike Compounds
P-glycoprotein (Pgp) is capable of recognizing and transporting
a wide range of chemically diverse compounds in vivo. Overcoming Pgp-mediated
efflux can represent a significant challenge when penetration into
the central nervous system is required or within the context of developing
anticancer therapies. While numerous in silico models have been developed
to predict Pgp-mediated efflux, these models rely on training sets
and are best suited to make interpolations. Therefore, it is desirable
to develop ab initio models that can be used to predict efflux liabilities.
Herein, we present a de novo method that can be used to predict Pgp-mediated
efflux potential for druglike compounds. A model, which correlates
the computed solvation free energy differences obtained in water and
chloroform with Pgp-mediated efflux (in logarithmic scale), was successful
in predicting Pgp efflux ratios for a wide range of chemically diverse
compounds with a R<sup>2</sup> and root-mean-square error of 0.65
and 0.29, respectively