8 research outputs found

    A Novel Deep Neural Network Technique for Drug–Target Interaction

    Get PDF
    The authors wish to acknowledge the financial support of the CoordenacAo de Aperfeicoamento de Pessoal de Nivel Superior (CAPES). This research was supported by the High-Performance Computing Center at UFRN (NPAD/UFRN). This study was financed in part by the CoordenacAo de Aperfeicoamento de Pessoal de Nivel Superior (CAPES)-Finance Code 001.Drug discovery (DD) is a time-consuming and expensive process. Thus, the industry employs strategies such as drug repositioning and drug repurposing, which allows the application of already approved drugs to treat a di erent disease, as occurred in the first months of 2020, during the COVID-19 pandemic. The prediction of drug–target interactions is an essential part of the DD process because it can accelerate it and reduce the required costs. DTI prediction performed in silico have used approaches based on molecular docking simulations, including similarity-based and network- and graph-based ones. This paper presents MPS2IT-DTI, a DTI prediction model obtained from research conducted in the following steps: the definition of a new method for encoding molecule and protein sequences onto images; the definition of a deep-learning approach based on a convolutional neural network in order to create a new method for DTI prediction. Training results conducted with the Davis and KIBA datasets show that MPS2IT-DTI is viable compared to other state-of-the-art (SOTA) approaches in terms of performance and complexity of the neural network model. With the Davis dataset, we obtained 0.876 for the concordance index and 0.276 for the MSE; with the KIBA dataset, we obtained 0.836 and 0.226 for the concordance index and the MSE, respectively. Moreover, the MPS2IT-DTI model represents molecule and protein sequences as images, instead of treating them as an NLP task, and as such, does not employ an embedding layer, which is present in other models.Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior (CAPES) 001High-Performance Computing Center at UFRN (NPAD/UFRN

    Do Large Scale Molecular Language Representations Capture Important Structural Information?

    Full text link
    Predicting the chemical properties of a molecule is of great importance in many applications, including drug discovery and material design. Machine learning based molecular property prediction holds the promise of enabling accurate predictions at much less computationally complex cost when compared to, for example, Density Functional Theory (DFT) calculations. Various representation learning methods in a supervised setting, including the features extracted using graph neural nets, have emerged for such tasks. However, the vast chemical space and the limited availability of labels make supervised learning challenging, calling for learning a general-purpose molecular representation. Recently, pre-trained transformer-based language models on large unlabeled corpus have produced state-of-the-art results in many downstream natural language processing tasks. Inspired by this development, we present molecular embeddings obtained by training an efficient transformer encoder model, MoLFormer. This model employs a linear attention mechanism coupled with highly parallelized training on SMILES sequences of 1.1 billion unlabeled molecules from the PubChem and ZINC datasets. Experiments show that the learned molecular representation outperforms supervised and unsupervised graph neural net baselines on several regression and classification tasks from 10 benchmark datasets, while performing competitively on others. Further analyses, specifically through the lens of attention, demonstrate that MoLFormer indeed learns a molecule's local and global structural aspects. These results provide encouraging evidence that large-scale molecular language models can capture sufficient structural information to be able to predict diverse molecular properties, including quantum-chemical propertie

    Difficulty in learning chirality for Transformer fed with SMILES

    Full text link
    Recent years have seen development of descriptor generation based on representation learning of extremely diverse molecules, especially those that apply natural language processing (NLP) models to SMILES, a literal representation of molecular structure. However, little research has been done on how these models understand chemical structure. To address this, we investigated the relationship between the learning progress of SMILES and chemical structure using a representative NLP model, the Transformer. The results suggest that while the Transformer learns partial structures of molecules quickly, it requires extended training to understand overall structures. Consistently, the accuracy of molecular property predictions using descriptors generated from models at different learning steps was similar from the beginning to the end of training. Furthermore, we found that the Transformer requires particularly long training to learn chirality and sometimes stagnates with low translation accuracy due to misunderstanding of enantiomers. These findings are expected to deepen understanding of NLP models in chemistry.Comment: 20 pages, 6 figure
    corecore