67 research outputs found

    Graph neural networks and attention-based CNN-LSTM for protein classification

    Full text link
    This paper focuses on three critical problems on protein classification. Firstly, Carbohydrate-active enzyme (CAZyme) classification can help people to understand the properties of enzymes. However, one CAZyme may belong to several classes. This leads to Multi-label CAZyme classification. Secondly, to capture information from the secondary structure of protein, protein classification is modeled as graph classification problem. Thirdly, compound-protein interactions prediction employs graph learning for compound with sequential embedding for protein. This can be seen as classification task for compound-protein pairs. This paper proposes three models for protein classification. Firstly, this paper proposes a Multi-label CAZyme classification model using CNN-LSTM with Attention mechanism. Secondly, this paper proposes a variational graph autoencoder based subspace learning model for protein graph classification. Thirdly, this paper proposes graph isomorphism networks (GIN) and Attention-based CNN-LSTM for compound-protein interactions prediction, as well as comparing GIN with graph convolution networks (GCN) and graph attention networks (GAT) in this task. The proposed models are effective for protein classification. Source code and data are available at https://github.com/zshicode/GNN-AttCL-protein. Besides, this repository collects and collates the benchmark datasets with respect to above problems, including CAZyme classification, enzyme protein graph classification, compound-protein interactions prediction, drug-target affinities prediction and drug-drug interactions prediction. Hence, the usage for evaluation by benchmark datasets can be more conveniently

    Convolutional architectures for virtual screening

    Get PDF
    Background: A Virtual Screening algorithm has to adapt to the different stages of this process. Early screening needs to ensure that all bioactive compounds are ranked in the first positions despite of the number of false positives, while a second screening round is aimed at increasing the prediction accuracy. Results: A novel CNN architecture is presented to this aim, which predicts bioactivity of candidate compounds on CDK1 using a combination of molecular fingerprints as their vector representation, and has been trained suitably to achieve good results as regards both enrichment factor and accuracy in different screening modes (98.55% accuracy in active-only selection, and 98.88% in high precision discrimination). Conclusion: The proposed architecture outperforms state-of-the-art ML approaches, and some interesting insights on molecular fingerprints are devised

    Prediction of drug-drug interaction potential using machine learning approaches

    Get PDF
    Drug discovery is a long, expensive, and complex, yet crucial process for the benefit of society. Selecting potential drug candidates requires an understanding of how well a compound will perform at its task, and more importantly, how safe the compound will act in patients. A key safety insight is understanding a molecule\u27s potential for drug-drug interactions. The metabolism of many drugs is mediated by members of the cytochrome P450 superfamily, notably, the CYP3A4 enzyme. Inhibition of these enzymes can alter the bioavailability of other drugs, potentially increasing their levels to toxic amounts. Four models were developed to predict CYP3A4 inhibition: logistic regression, random forests, support vector machine, and neural network. Two novel convolutional approaches were explored for data featurization: SMILES string auto-extraction and 2D structure auto-extraction. The logistic regression model achieved an accuracy of 83.2%, the random forests model, 83.4%, the support vector machine model, 81.9%, and the neural network model, 82.3%. Additionally, the model built with SMILE string auto-extraction had an accuracy of 82.3%, and the model with 2D structure auto-extraction, 76.4%. The advantages of the novel featurization methods are their ability to learn relevant features from compound SMILE strings, eliminating feature engineering. The developed methodologies can be extended towards predicting any structure-activity relationship and fitted for other areas of drug discovery and development

    Prediction of pharmacological activities from chemical structures with graph convolutional neural networks

    Get PDF
    化合物の薬理作用を予測する技術を開発 --薬理作用ビッグデータを用いて--. 京都大学プレスリリース. 2021-01-13.Many therapeutic drugs are compounds that can be represented by simple chemical structures, which contain important determinants of affinity at the site of action. Recently, graph convolutional neural network (GCN) models have exhibited excellent results in classifying the activity of such compounds. For models that make quantitative predictions of activity, more complex information has been utilized, such as the three-dimensional structures of compounds and the amino acid sequences of their respective target proteins. As another approach, we hypothesized that if sufficient experimental data were available and there were enough nodes in hidden layers, a simple compound representation would quantitatively predict activity with satisfactory accuracy. In this study, we report that GCN models constructed solely from the two-dimensional structural information of compounds demonstrated a high degree of activity predictability against 127 diverse targets from the ChEMBL database. Using the information entropy as a metric, we also show that the structural diversity had less effect on the prediction performance. Finally, we report that virtual screening using the constructed model identified a new serotonin transporter inhibitor with activity comparable to that of a marketed drug in vitro and exhibited antidepressant effects in behavioural studies

    Unsupervised Learning in Drug Design from Self-Organization to Deep Chemistry

    Get PDF
    The availability of computers has brought novel prospects in drug design. Neural networks (NN) were an early tool that cheminformatics tested for converting data into drugs. However, the initial interest faded for almost two decades. The recent success of Deep Learning (DL) has inspired a renaissance of neural networks for their potential application in deep chemistry. DL targets direct data analysis without any human intervention. Although back-propagation NN is the main algorithm in the DL that is currently being used, unsupervised learning can be even more efficient. We review self-organizing maps (SOM) in mapping molecular representations from the 1990s to the current deep chemistry. We discovered the enormous efficiency of SOM not only for features that could be expected by humans, but also for those that are not trivial to human chemists. We reviewed the DL projects in the current literature, especially unsupervised architectures. DL appears to be efficient in pattern recognition (Deep Face) or chess (Deep Blue). However, an efficient deep chemistry is still a matter for the future. This is because the availability of measured property data in chemistry is still limited
    corecore