29 research outputs found
Prediction of potential commercially inhibitors against SARS-CoV-2 by multi-task deep model
The outbreak of novel coronavirus pneumonia (COVID-19) caused thousands of
deaths worldwide, and the number of total infections is still rising. However,
the development of effective vaccine for this novel virus would take a few
months. Thus it is urgent to identify some potentially effective old drugs that
can be used immediately. Fortunately, some compounds that can inhibit
coronavirus in vitro have been reported. In this study, the
coronavirus-specific dataset was used to fine-tune our pre-trained multi-task
deep model. Next we used the re-trained model to select available commercial
drugs against targeted proteins of SARS-CoV-2. The results show that abacavir,
a powerful nucleoside analog reverse transcriptase inhibitor used to treat HIV,
is predicted to have high binding affinity with several proteins of SARS-CoV-2.
Almitrine mesylate and roflumilast which are used for respiratory diseases such
as chronic obstructive pulmonary disease are also predicted to have inhibitory
effect. Overall, ten drugs are listed as potential inhibitors and the important
sites for these binding by our model are exhibited. We hope these results would
be useful in the fight against SARS-CoV-2
Performance Comparison of Data Sampling Techniques to Handle Imbalanced Class on Prediction of Compound-Protein Interaction
The prediction of Compound-Protein Interactions (CPI) is an essential step in the drug-target analysis for developing new drugs as well as for drug repositioning. One challenging issue in this field is that commonly there are more numbers of non-interacting compound-protein pairs than interacting pairs. This problem causes bias, which may degrade the prediction of CPI. Besides, currently, there is not much research on CPI prediction that compares data sampling techniques to handle the class imbalance problem. To address this issue, we compare four data sampling techniques, namely Random Under-sampling (RUS), Combination of Over-Under-sampling (COUS), Synthetic Minority Over-sampling Technique (SMOTE), and Tomek Link (T-Link). The benchmark CPI data: Nuclear Receptor and G-Protein Coupled Receptor (GPCR) are used to test these techniques. Area Under Curve (AUC) applied to evaluate the CPI prediction performance of each technique. Results show that the AUC values for RUS, COUS, SMOTE, and T-Link are 0.75, 0.77, 0.85 and 0.79 respectively on Nuclear Receptor data and 0.70, 0.85, 0.91 and 0.72 respectively on GPCR data. These results indicate that SMOTE has the highest AUC values. Furthermore, we found that the SMOTE technique is more capable of handling class imbalance problems on CPI prediction compared to the remaining three other techniques
Graph neural networks and attention-based CNN-LSTM for protein classification
This paper focuses on three critical problems on protein classification.
Firstly, Carbohydrate-active enzyme (CAZyme) classification can help people to
understand the properties of enzymes. However, one CAZyme may belong to several
classes. This leads to Multi-label CAZyme classification. Secondly, to capture
information from the secondary structure of protein, protein classification is
modeled as graph classification problem. Thirdly, compound-protein interactions
prediction employs graph learning for compound with sequential embedding for
protein. This can be seen as classification task for compound-protein pairs.
This paper proposes three models for protein classification. Firstly, this
paper proposes a Multi-label CAZyme classification model using CNN-LSTM with
Attention mechanism. Secondly, this paper proposes a variational graph
autoencoder based subspace learning model for protein graph classification.
Thirdly, this paper proposes graph isomorphism networks (GIN) and
Attention-based CNN-LSTM for compound-protein interactions prediction, as well
as comparing GIN with graph convolution networks (GCN) and graph attention
networks (GAT) in this task. The proposed models are effective for protein
classification. Source code and data are available at
https://github.com/zshicode/GNN-AttCL-protein. Besides, this repository
collects and collates the benchmark datasets with respect to above problems,
including CAZyme classification, enzyme protein graph classification,
compound-protein interactions prediction, drug-target affinities prediction and
drug-drug interactions prediction. Hence, the usage for evaluation by benchmark
datasets can be more conveniently
Deep Learning-Based Conformal Prediction of Toxicity
Predictive modeling for toxicity can help reduce risks in a range of applications and potentially serve as the basis for regulatory decisions. However, the utility of these predictions can be limited if the associated uncertainty is not adequately quantified. With recent studies showing great promise for deep learning-based models also for toxicity predictions, we investigate the combination of deep learning-based predictors with the conformal prediction framework to generate highly predictive models with well-defined uncertainties. We use a range of deep feedforward neural networks and graph neural networks in a conformal prediction setting and evaluate their performance on data from the Tox21 challenge. We also compare the results from the conformal predictors to those of the underlying machine learning models. The results indicate that highly predictive models can be obtained that result in very efficient conformal predictors even at high confidence levels. Taken together, our results highlight the utility of conformal predictors as a convenient way to deliver toxicity predictions with confidence, adding both statistical guarantees on the model performance as well as better predictions of the minority class compared to the underlying models