Search CORE

8 research outputs found

Protein asparagine deamidation prediction based on structures with machine learning methods

Author: Lei Jia (303610)
Yaxiong Sun (2004886)
Publication venue
Publication date: 21/07/2017
Field of study

<div>Chemical stability is a major concern in the development of protein therapeutics due to its impact on both efficacy and safety. Protein “hotspots” are amino acid residues that are subject to various chemical modifications, including deamidation, isomerization, glycosylation, oxidation etc. A more accurate prediction method for potential hotspot residues would allow their elimination or reduction as early as possible in the drug discovery process. In this work, we focus on prediction models for asparagine (Asn) deamidation. Sequence-based prediction method simply identifies the NG motif (amino acid asparagine followed by a glycine) to be liable to deamidation. It still dominates deamidation evaluation process in most pharmaceutical setup due to its convenience. However, the simple sequence-based method is less accurate and often causes over-engineering a protein. We introduce structure-based prediction models by mining available experimental and structural data of deamidated proteins. Our training set contains 194 Asn residues from 25 proteins that all have available high-resolution crystal structures. Experimentally measured deamidation half-life of Asn in penta-peptides as well as 3D structure-based properties, such as solvent exposure, crystallographic B-factors, local secondary structure and dihedral angles etc., were used to train prediction models with several machine learning algorithms. The prediction tools were cross-validated as well as tested with an external test data set. The random forest model had high enrichment in ranking deamidated residues higher than non-deamidated residues while effectively eliminated false positive predictions. It is possible that such quantitative protein structure–function relationship tools can also be applied to other protein hotspot predictions. In addition, we extensively discussed metrics being used to evaluate the performance of predicting unbalanced data sets such as the deamidation case.</div

Directory of Open Access Journals

FigShare

Performance comparison between NG-motif, NGOME and our structure-based methods.

Author: Lei Jia (303610)
Yaxiong Sun (2004886)
Publication venue
Publication date
Field of study

NG-motif and NGOME prediction performances were represented by plotting the TPR v.s. FPR points (red triangle for NG-motif and blue square for NGOME) on the ROC of the RF method (purple line).</p

FigShare

Cross validation of binary deamidation prediction models.

Author: Lei Jia (303610)
Yaxiong Sun (2004886)
Publication venue
Publication date
Field of study

Cross validation of binary deamidation prediction models.</p

FigShare

Different types of descriptors that were developed for building deamidation prediction models.

Author: Lei Jia (303610)
Yaxiong Sun (2004886)
Publication venue
Publication date
Field of study

Different types of descriptors that were developed for building deamidation prediction models.</p

FigShare

Comparison between NG-motif, NGOME, and our structure-based prediction methods.

Author: Lei Jia (303610)
Yaxiong Sun (2004886)
Publication venue
Publication date
Field of study

Comparison between NG-motif, NGOME, and our structure-based prediction methods.</p

FigShare

Features (descriptors) ranking by RFE (the RF model).

Author: Lei Jia (303610)
Yaxiong Sun (2004886)
Publication venue
Publication date
Field of study

Features (descriptors) ranking by RFE (the RF model).</p

FigShare

Blind test of binary deamidation prediction models.

Author: Lei Jia (303610)
Yaxiong Sun (2004886)
Publication venue
Publication date
Field of study

Blind test of binary deamidation prediction models.</p

FigShare

De Novo Prediction of P‑Glycoprotein-Mediated Efflux Liability for Druglike Compounds

Author: Hakan Gunaydin (177019)
Matthew M. Weiss (1656997)
Yaxiong Sun (2004886)
Publication venue
Publication date
Field of study

P-glycoprotein (Pgp) is capable of recognizing and transporting a wide range of chemically diverse compounds in vivo. Overcoming Pgp-mediated efflux can represent a significant challenge when penetration into the central nervous system is required or within the context of developing anticancer therapies. While numerous in silico models have been developed to predict Pgp-mediated efflux, these models rely on training sets and are best suited to make interpolations. Therefore, it is desirable to develop ab initio models that can be used to predict efflux liabilities. Herein, we present a de novo method that can be used to predict Pgp-mediated efflux potential for druglike compounds. A model, which correlates the computed solvation free energy differences obtained in water and chloroform with Pgp-mediated efflux (in logarithmic scale), was successful in predicting Pgp efflux ratios for a wide range of chemically diverse compounds with a R2 and root-mean-square error of 0.65 and 0.29, respectively

FigShare